Data-centric programming language: Difference between revisions

Content deleted Content added
m linking
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0
Line 18:
[[File:Data-Centric Figure1.jpg|thumb|left|Figure 1: Sample Pig Latin program <ref name="PigLatin"/>]]
[[File:Data-Centric Figure2.jpg|thumb|right|Figure 2: Pig program translation to MapReduce<ref name="PigLatin"/>]]
Hadoop is an open source software project sponsored by The Apache Software Foundation (http://www.apache.org) which implements the MapReduce architecture. The Hadoop execution environment supports additional distributed data processing capabilities which are designed to run using the Hadoop MapReduce architecture. These include Pig – a high-level data-flow programming language and execution framework for data-intensive computing. Pig was developed at Yahoo! to provide a specific data-centric language notation for data analysis applications and to improve programmer productivity and reduce development cycles when using the Hadoop MapReduce environment. Pig programs are automatically translated into sequences of MapReduce programs if needed in the execution environment. Pig provides capabilities in the language for loading, storing, filtering, grouping, de-duplication, ordering, sorting, aggregation, and joining operations on the data.<ref name="PigLatin">[http://i.stanford.edu/~usriv/talks/sigmod08-pig-latin.ppt#283,18,UseruCode Pig latin: A Not-So-Foreign Language for Data Processing] {{Webarchive|url=https://web.archive.org/web/20110720045445/http://i.stanford.edu/~usriv/talks/sigmod08-pig-latin.ppt#283,18,UseruCode |date=2011-07-20 }}, by C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Stanford University, 2008.</ref> Figure 1 shows a sample Pig program and Figure 2 shows how this is translated into a series of MapReduce operations.
 
===HPCC ECL===