Content deleted Content added
No edit summary |
Remove hatnote-like text: not necessary at this unambiguous title (WP:NAMB); term does not redirect here |
||
(47 intermediate revisions by 43 users not shown) | |||
Line 1:
{{Short description|Programming language}}
{{refimprove|date=April 2011}}
{{Infobox programming language
|name = Sawzall
|logo =
|paradigm =
|year = {{Start date and age|2003}}
|designer =
|developer = [[Google]]
|latest_release_version =
|latest_release_date =
|typing =
|implementations =
|dialects =
|influenced_by =
|influenced =
|current version =
|operating_system =
|license = [[Apache License 2.0]]
|website = {{URL|https://code.google.com/archive/p/szl/}}
}}
'''Sawzall''' is a procedural [[___domain-specific language|___domain-specific]] [[programming language]], used by [[Google]] to process large numbers of individual [[log file|log]] records. Sawzall was first described in 2003,<ref>Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan. [http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/sv//archive/sawzall-sciprog.pdf Interpreting the Data: Parallel Analysis with Sawzall]</ref> and the szl runtime was open-sourced in August 2010.<ref>[http://code.google.com/p/szl/ Sawzall's open source project at Google Code].</ref> However, since the [[MapReduce]] table aggregators have not been released,<ref name="open-source-scope"/> the open-sourced runtime is not useful for large-scale data analysis of multiple log files off the shelf. Sawzall has been replaced by Lingo (logs in [[Go (programming language)|Go]]) for most purposes within Google.<ref>{{cite web|url=http://www.unofficialgoogledatascience.com/2015/12/replacing-sawzall-case-study-in-___domain.html|title=Replacing Sawzall|date=2015-12-04|access-date=2018-06-18}}</ref>
==Motivation==
Google's server logs are stored as large collections of records ([[Protocol Buffers]]) that are partitioned over many disks within [[Google File System|GFS]]. In order to perform calculations involving the logs, engineers can write [[MapReduce]] programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts, [[Rob Pike]] et al. developed the Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.
Currently, only the language runtime (which runs a Sawzall script once over a single input) has been open-sourced; the supporting program built on MapReduce has not been released.<ref name="open-source-scope">[http://groups.google.com/group/szl-users/browse_thread/thread/c0d90423d0fc27bd Discussion on which parts of Sawzall are open-source].</ref>
==Features==
Some interesting features include:
* A Sawzall script has a single input (a log record) and can output only by emitting to tables. The script can have no other side-effects.
* A script can define any number of output tables. Table types include:
** <code>collection</code> saves every value emitted
** <code>sum</code> saves the sum of every emitted value
** <code>maximum(n)</code> saves only the highest n values on a given weight.
*In addition, there are several statistical table types that give inexact results. The higher the parameter n, the more accurate the estimates are.
** <code>sample(n)</code> gives a random sample of n values from all the emitted values
** <code>quantile(n)</code> calculates a cumulative probability distribution of the given numbers.
** <code>top(n)</code> gives n values that are probably the most frequent of the emitted values.
** <code>unique(n)</code> estimates the number of unique values emitted.
Sawzall's design favors efficiency and engine simplicity over power:
* Sawzall is statically typed, and the engine compiles the script to [[x86]] before running it.
* Sawzall supports the [[compound data type]]s lists, maps, and structs. However, there are no references or pointers. All assignments and function arguments create copies. This means that [[recursive data structure]]s and cycles are impossible.
* Like C, functions can modify [[global variable]]s and [[local variable]]s but are not closures.
==Sawzall code==
Line 13 ⟶ 57:
emit sum_of_squares <- x * x;
==
* [[Pig (programming tool)|Pig]] – similar tool and language for use with [[Apache Hadoop]]
* S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on Operating Systems Principles, Proceedings,▼
* [[Sawmill (software)]]
* MapReduce[http://www.soe.ucsc.edu/classes/cmps253/Spring07/notes/mapreduce.pdf]▼
==
{{Reflist}}
== Further reading ==
▲* S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on Operating Systems Principles, Proceedings, 17 ACM Press, 2003, pp. 29–43.
== External links ==
* [https://code.google.com/archive/p/szl/ Google Code Archive - Long-term storage for Google Code Project Hosting.]
▲*
{{Rob Pike navbox}}
{{Google FOSS}}
[[Category:Domain-specific programming languages]]
[[Category:Procedural programming languages]]
[[Category:Google software]]
[[Category:Programming languages created in 2003]]
[[Category:Software using the Apache license]]
|