Revision as of 14:24, 7 November 2012 edit Schubi87 (talk \| contribs) 23 edits No edit summary ← Previous edit		Revision as of 14:24, 7 November 2012 edit undo Schubi87 (talk \| contribs) 23 edits →Advantages of PACT over MapReduce Next edit →
Line 99: ===== Advantages of PACT over MapReduce ===== -* The PACT programming model encourages a more modular programming style. Although the number of user functions is usually higher, they are more fine-grain and focus on specific problems. Hence, interweaving of functionality which is common for MapReduce jobs can be avoided. -* Data analysis tasks can be expressed as straight-forward data flows, especially when multiple inputs are required. -* PACT has a record-based data model, which reduces the need to specify custom data types as not all data items need to be packed into a single value type. -* PACT frequently eradicates the need for auxiliary structures, such as the distributed cache, which "break" the parallel programming model. -* Data organization operations such as building a Cartesian product or combining records with equal keys are done by the runtime system. In MapReduce such often needed functionality must be provided by the developer of the user code. -* PACTs specify data parallelization in a declarative way which leaves several degrees of freedom to the system. These degrees of freedom are an important prerequisite for automatic optimization. The [[PactCompiler\|PACT compiler]] enumerate different execution strategies and chooses the strategy with the least estimated amount of data to ship. In contrast, Hadoop executes MapReduce jobs always with the same strategy. For a more detailed comparison of the MapReduce and PACT programming models you can read our paper //"MapReduce and PACT - Comparing Data Parallel Programming Models"// (see our [[http://www.stratosphere.eu/publications\|publications page]]). ==References==

Parallelization contract: Difference between revisions