Revision as of 07:37, 7 March 2011 edit Jratke (talk \| contribs) 22 edits New language for Big Data described ← Previous edit		Revision as of 07:38, 7 March 2011 edit undo Jratke (talk \| contribs) 22 edits →=MapReduce Next edit →
Line 41: . However most data growth is with data in unstructured form and new processing paradigms with more flexible data models were needed. Several solutions have emerged including the [[MapReduce]] architecture pioneered by Google and now available in an open-source implementation called [[Hadoop]] used by [[Yahoo]], [[Facebook]], and others. [LexisNexis Risk Solutions]] also developed and implemented a scalable platform for data-intensive computing which is used by [[LexisNexis]]. ====MapReduce=== The [[MapReduce]] architecture and programming model pioneered by [[Google]] is an example of a modern systems architecture designed for data-intensive computing <ref>[http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] by J. Dean, and S. Ghemawat. Proceedings of the Sixth Symposium on Operating System Design and Implementation (OSDI), 2004.</ref>. The MapReduce architecture allows programmers to use a functional programming style to create a map function that processes a [[key-value pair]] associated with the input data to generate a set of intermediate [[key-value pairs]], and a reduce function that merges all intermediate values associated with the same intermediate key. Since the system automatically takes care of details like partitioning the input data, scheduling and executing tasks across a processing cluster, and managing the communications between nodes, programmers with no experience in parallel programming can easily use a large distributed processing environment.

Data-intensive computing: Difference between revisions