Sawzall (programming language): Difference between revisions

Content deleted Content added
Remove hatnote-like text: not necessary at this unambiguous title (WP:NAMB); term does not redirect here
 
(26 intermediate revisions by 23 users not shown)
Line 1:
{{Short description|Programming language}}
'''Sawzall''' is a procedural [[Domain specific language | ___domain-specific programming language]], used by [[Google]] to process large numbers of individual log records. Sawzall was first described in 2003,<ref>Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan. [http://labs.google.com/papers/sawzall-sciprog.pdf Interpreting the Data: Parallel Analysis with Sawzall]</ref> and the szl runtime was open-sourced in August 2010.<ref> http://code.google.com/p/szl/ Sawzall's open source project at Google Code.</ref> However, since the [[MapReduce]] table aggregators have not been released, the open-sourced runtime is not useful for large-scale data analysis off-the-shelf.
{{More footnotesrefimprove|date=April 2011}}
{{Infobox programming language
[[pl:|name = Sawzall]]
|logo =
|paradigm =
|year = {{Start date and age|2003}}
|designer =
|developer = [[Google]]
|latest_release_version =
|latest_release_date =
|typing =
|implementations =
|dialects =
|influenced_by =
|influenced =
|current version =
|operating_system =
|license = [[Apache License 2.0]]
|website = {{URL|https://code.google.com/archive/p/szl/}}
}}
'''Sawzall''' is a procedural [[Domain ___domain-specific language | ___domain-specific]] [[programming language]], used by [[Google]] to process large numbers of individual [[log file|log]] records. Sawzall was first described in 2003,<ref>Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan. [http://labsstatic.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/paperssv//archive/sawzall-sciprog.pdf Interpreting the Data: Parallel Analysis with Sawzall]</ref> and the szl runtime was open-sourced in August 2010.<ref> [http://code.google.com/p/szl/ Sawzall's open source project at Google Code].</ref> However, since the [[MapReduce]] table aggregators have not been released,<ref name="open-source-scope"/> the open-sourced runtime is not useful for large-scale data analysis of multiple log files off- the- shelf. Sawzall has been replaced by Lingo (logs in [[Go (programming language)|Go]]) for most purposes within Google.<ref>{{cite web|url=http://www.unofficialgoogledatascience.com/2015/12/replacing-sawzall-case-study-in-___domain.html|title=Replacing Sawzall|date=2015-12-04|access-date=2018-06-18}}</ref>
 
==Motivation==
Google's server logs are stored as large collections of records ([[protocolProtocol buffersBuffers]]) that are partitioned over many disks within [[Google File System|GFS]]. In order to perform calculations involving the logs, engineers can write [[MapReduce]] programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts, [[Rob Pike]] et al. developed the Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.
 
Currently, only the language runtime (which runs a Sawzall script once over a single input) has been open-sourced, and; the supporting program built on MapReduce has not been released.<ref name="open-source-scope">[http://groups.google.com/group/szl-users/browse_thread/thread/c0d90423d0fc27bd Discussion on which parts of Sawzall are open-source].</ref>
 
==Features==
Line 21 ⟶ 42:
Sawzall's design favors efficiency and engine simplicity over power:
* Sawzall is statically typed, and the engine compiles the script to [[x86]] before running it.
* Sawzall supports the [[compound data typestype]]s lists, maps, and structs. However, there are no references or pointers. All assignments and function arguments create copies. This means that [[recursive data structuresstructure]]s and cycles are impossible.
* Like C, functions can modify [[global variablesvariable]]s and [[local variablesvariable]]s but are not closures.
 
==Sawzall code==
Line 36 ⟶ 57:
emit sum_of_squares <- x * x;
 
==Notes See also ==
* [[Pig (programming tool)|Pig]] – similar tool and language for use with [[Apache Hadoop]]
<references/>
* [[Sawmill (software)]]
 
== References ==
{{Reflist}}
{{More footnotes|date=April 2011}}
* S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on Operating Systems Principles, Proceedings, 17 ACM Press, 2003, pp. 29 – 43.
* [[MapReduce]] [http://www.soe.ucsc.edu/classes/cmps253/Spring07/notes/mapreduce.pdf]
 
== Further reading ==
[[Category:Domain-specific programming languages]]
* S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on Operating Systems Principles, Proceedings, 17 ACM Press, 2003, pp. 29 – 43&nbsp;29–43.
[[Category:Procedural programming languages]]
 
[[Category:Google]]
== External links ==
[[Category:Programming languages created in 2003]]
* [https://code.google.com/archive/p/szl/ Google Code Archive - Long-term storage for Google Code Project Hosting.]
* [[MapReduce]] [https://web.archive.org/web/20110604204310/http://www.soe.ucsc.edu/classes/cmps253/Spring07/notes/mapreduce.pdf MapReduce]
 
{{compu-prog-stub}}
{{Software-type-stub}}
{{Rob Pike navbox}}
{{Google FOSS}}
 
[[Category:Domain-specific programming languages]]
[[hu:Sawzall (programozási nyelv)]]
[[Category:Procedural programming languages]]
[[pl:Sawzall]]
[[Category:Google software]]
[[Category:Programming languages created in 2003]]
[[Category:Software using the Apache license]]