Content deleted Content added
Magioladitis (talk | contribs) m Tagging, added Empty section (1) tag using AWB (9814) |
m WP:CHECKWIKI error fix. Section heading problem. Violates WP:MOSHEAD. |
||
Line 25:
The currently supported Input Contracts and annotation are presented and discussed in the following.
Input Contracts split the input data of a PACT into independently processable subsets that are handed to the user function of the PACT.
Line 64 ⟶ 65:
{{:wiki:cogroup.png?nolink&200|}}
In contrast to MapReduce, PACT uses a more generic data model of records ([[PactRecord|Pact Record]]) to pass data between functions. The Pact Record can be thought of as a tuple with a free schema. The interpretation of the fields of a record is up to the user function. A Key/Value pair (as in MapReduce) is a special case of that record with only two fields (the key and the value).
Line 72 ⟶ 74:
The record may be sparsely filled, i.e. it may have fields that have //null// values. It is legal to produce a record where for example only fields 2 and 5 are set. Fields 1, 3, 4 are interpreted to be //null//. Fields that are used by a contract as key fields may however not be null, or an exception is raised.
User code annotation are optional in the PACT programming model. They allow the developer to make certain behaviors of her/his user code explicit to the optimizer. The PACT optimizer can utilize that information to obtain more efficient execution plans. However, it will not impact the correctness of the result if a valid annotation was not attached to the user code. On the other hand, invalidly specified annotations might cause the computation of wrong results. In the following, we list the current set of available Output Contracts.
Line 84 ⟶ 87:
The **Constant Fields Except** annotation is inverse to the **Constant Fields** annotation. It annotates all fields which might be modified by the annotated user-function, hence the optimizer considers **any not annotated field as constant**. This annotation should be used very carefully! Again, for binary second-order functions (Cross, Match, CoGroup), one annotation per input can be defined. Note that either the Constant Fields or the Constant Fields Except annotation may be used for an input.
PACT programs are constructed as data flow graphs that consist of data sources, PACTs, and data sinks. One or more data sources read files that contain the input data and generate records from those files. Those records are processed by one or more PACTs, each consisting of an Input Contract, user code, and optional code annotations. Finally, the results are written back to output files by one or more data sinks. In contrast to the MapReduce programming model, a PACT program can be arbitrary complex and has no fixed structure. \\
Line 92 ⟶ 96:
{{:wiki:pactProgram.png?nolink&600|}}
* The PACT programming model encourages a more modular programming style. Although the number of user functions is usually higher, they are more fine-grain and focus on specific problems. Hence, interweaving of functionality which is common for MapReduce jobs can be avoided.
|