Parallelization contract: Difference between revisions

Content deleted Content added
m "exampe" -> "example"
Line 68:
In contrast to MapReduce, PACT uses a more generic data model of records ([[PactRecord|Pact Record]]) to pass data between functions. The Pact Record can be thought of as a tuple with a free schema. The interpretation of the fields of a record is up to the user function. A Key/Value pair (as in MapReduce) is a special case of that record with only two fields (the key and the value).
 
For input contracts that operate on keys (like //Reduce//, //Match//, or //CoGroup//, one specifies which combination of the record's fields make up the key. An arbitrary combination of fields may used. See the [https://github.com/stratosphere-eu/stratosphere/blob/master/pact/pact-examples/src/main/java/eu/stratosphere/pact/example/relational/TPCHQuery3.java|TPCH Query ExampeExample] on how programs defining //Reduce// and //Match// contracts on one or more fields and can be written to minimally move data between fields.
 
The record may be sparsely filled, i.e. it may have fields that have //null// values. It is legal to produce a record where for example only fields 2 and 5 are set. Fields 1, 3, 4 are interpreted to be //null//. Fields that are used by a contract as key fields may however not be null, or an exception is raised.