Content deleted Content added
m r2.7.2) (Robot: Adding bg:Ориентиран към данни програмен език |
GreenC bot (talk | contribs) Move 1 url. Wayback Medic 2.5 per WP:URLREQ#ieee.org |
||
(18 intermediate revisions by 14 users not shown) | |||
Line 1:
{{Short description|Category of programming languages}}
'''Data-centric programming language'''
== Background ==
The rapid growth of the [[Internet]] and [[World Wide Web]] has led to huge amounts of information available online and the need for [[Big Data]] processing capabilities. Business and government organizations create large amounts of both structured and [[unstructured data|unstructured]] information which needs to be processed, analyzed, and linked.<ref>[
Computer system architectures such as [[Hadoop]] and [[HPCC]] which can support data-parallel applications are a potential solution to the terabyte and petabyte scale data processing requirements of [[data-intensive computing]].<ref>"A Design Methodology for Data-Parallel Applications
Data-centric programming languages provide a processing approach in which applications are expressed in terms of high-level operations on data, and the runtime system transparently controls the scheduling, execution, load balancing, communications, and movement of programs and data across the computing cluster.<ref>[
Declarative Data-centric programming languages are inherently adaptable to various forms of distributed computing including clusters and data grids and cloud computing.<ref>[http://demsky.eecs.uci.edu/publications/PLDIbamboo.pdf Bamboo: A Data-Centric, Object-Oriented Approach to Many-core Software], by J. Zhou, and B. Demsky. Programming Language Design and Implementation, 2010.</ref> Using declarative, data-centric programming languages suggest more than just adapting to a new computing capability, it also suggests changes to the thought process of data analysis and design of applications.<ref>"[https://www.osti.gov/servlets/purl/883131 Data-Centric Computing with the Netezza Architecture
== Data-centric language examples ==
[[SQL]] is the best known declarative, data-centric programming language and has been in use since the
===Hadoop Pig===
Line 18 ⟶ 19:
[[File:Data-Centric Figure1.jpg|thumb|left|Figure 1: Sample Pig Latin program <ref name="PigLatin"/>]]
[[File:Data-Centric Figure2.jpg|thumb|right|Figure 2: Pig program translation to MapReduce<ref name="PigLatin"/>]]
Hadoop is an open source software project sponsored by The Apache Software Foundation (http://www.apache.org) which implements the MapReduce architecture. The Hadoop execution environment supports additional distributed data processing capabilities which are designed to run using the Hadoop MapReduce architecture. These include Pig – a high-level data-flow programming language and execution framework for data-intensive computing. Pig was developed at Yahoo! to provide a specific data-centric language notation for data analysis applications and to improve programmer productivity and reduce development cycles when using the Hadoop MapReduce environment. Pig programs are automatically translated into sequences of MapReduce programs if needed in the execution environment. Pig provides capabilities in the language for loading, storing, filtering, grouping, de-duplication, ordering, sorting, aggregation, and joining operations on the data.<ref name="PigLatin">[http://i.stanford.edu/~usriv/talks/sigmod08-pig-latin.ppt#283,18,UseruCode Pig latin: A Not-So-Foreign Language for Data Processing] {{Webarchive|url=https://web.archive.org/web/20110720045445/http://i.stanford.edu/~usriv/talks/sigmod08-pig-latin.ppt#283,18,UseruCode |date=2011-07-20 }}, by C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins. Stanford University, 2008.</ref> Figure 1 shows a sample Pig program and Figure 2 shows how this is translated into a series of MapReduce operations.
===HPCC ECL===
Line 30 ⟶ 31:
ECL includes built-in data transform operations which process through entire datasets including PROJECT, ITERATE, ROLLUP, JOIN, COMBINE, FETCH, NORMALIZE, DENORMALIZE, and PROCESS. For example, the transform function defined for a JOIN operation receives two records, one from each dataset being joined, and can perform any operations on the fields in the pair of records, and returns an output record which can be completely different from either of the input records. Example syntax for the JOIN operation from the ECL Language Reference Manual is shown in Figure 3. Figure 4 shows an example of the equivalent ECL code for the Pig example program shown in Figure 1.
The ECL programming language also provides built-in primitives for [[Natural language processing]] (NLP) with PATTERN statements and the built-in PARSE operation. PATTERN statements allow matching patterns including regular expressions to be defined and used to parse information from unstructured data such as raw text. PATTERN statements can be combined to implement complex parsing operations or complete grammars from [[
== See Also ==▼
* [[Programming language]]
* [[Declarative programming]]
* [[
* [[Parallel computing]]
* [[Distributed computing]]
Line 44 ⟶ 43:
== References ==
{{reflist}}
[[Category:Parallel computing]]
[[Category:Distributed computing]]
[[Category:Data-centric programming languages]]
|