Content deleted Content added
m bullets; slight rewording for sentence flow |
No edit summary |
||
Line 4:
A MapReduce program is composed of a '''Map()''' procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a '''Reduce()''' procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). The "MapReduce System" (also called "infrastructure" or "framework") orchestrates the processing by [[Marshalling (computer science)|marshalling]] the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, and providing for [[Redundancy (engineering)|redundancy]] and [[Fault-tolerant computer system|fault tolerance]].
The model is inspired by the [[map (higher-order function)|map]] and [[fold (higher-order function)|reduce]] functions commonly used in [[functional programming]],<ref name="map">"Our abstraction is inspired by the map and reduce primitives present in Lisp and many other functional languages." -[http://research.google.com/archive/mapreduce.html "MapReduce: Simplified Data Processing on Large Clusters"], by Jeffrey Dean and Sanjay Ghemawat; from [[Google Research]]</ref> although their purpose in the MapReduce framework is not the same as in their original forms.<ref>{{cite doi|10.1016/j.scico.2007.07.001}}</ref> The key contributions of the MapReduce framework are not the actual map and reduce functions, but the scalability and fault-tolerance achieved for a variety of applications by optimizing the execution engine once. As such, a [[single-threaded]] implementation of MapReduce (such as [[MongoDB]]) will usually not be faster than a traditional (non-MapReduce) implementation, any gains are usually only seen with [[multi-threaded]] implementations.<ref name=stackoverflow>{{cite web
| url = https://stackoverflow.com/questions/3947889/mongodb-terrible-mapreduce-performance
| title = MongoDB: Terrible MapReduce Performance
|