Sawzall (programming language): Difference between revisions

Content deleted Content added
No edit summary
Line 2:
 
==Motivation==
Google's server logs are stored as large collections of records ([[protocol buffers]]) that are partitioned over many disks within [[Google File System|GFS]]. In order to perform calculations involving the logs, engineers can write [[MapReduce]] programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts, [[Rob Pike]] et al. developed the Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.
 
Currently, only the language runtime (which runs a Sawzall script once over a single input) has been open-sourced, and the supporting program built on MapReduce has not been released.<ref>[http://groups.google.com/group/szl-users/browse_thread/thread/c0d90423d0fc27bd Discussion on which parts of Sawzall are open-source]</ref>