Data-centric programming language: Difference between revisions

Content deleted Content added
Added Pig latin images
Added the rest of the images - Ready to publish
Line 1:
{{Userspace draft|source=ArticleWizard|date=May 2011}}
 
'''Data-centric programming language''' Data-Centric Programming Language defines a [[category]] of programming languages where the primary function is the management and manipulation of data. A data-centric programming language includes built-in processing primitives for accessing data stored in sets, tables, lists, and other data structures and databases, and for specific manipulation and transformation of data required by a programming application. Data-centric programming languages are typically [[declarative]] and often dataflow-oriented, and define the processing result desired rather than; the specific processing steps required to perform the processing which isare left to the language compiler. The [[SQL]] relational database language is an example of a declarative, data-centric language. Declarative, data-centric programming languages are ideal for [[Data Intensive Computing|data intensive computing]] applications.
 
== Background ==
Line 27:
 
===HPCC ECL===
 
<span>
[[File:Data-Centric Figure3.jpg|thumb|left|Figure 3: ECL sample syntax for JOIN operation]]
 
The HPCC data-intensive computing platform from LexisNexis Risk Solutions includes a new high-level declarative, data-centric programming language called ECL. ECL allows the programmer to define what the data processing result should be and the dataflows and transformations that are necessary to achieve the result. The ECL language includes extensive capabilities for data definition, filtering, data management, and data transformation, and provides an extensive set of built-in functions to operate on records in datasets which can include user-defined transformation functions. ECL programs are compiled into optimized C++ source code, which is subsequently compiled into executable code and distributed to the nodes of a processing cluster. ECL combines data representation with algorithm implementation, and is the fusion of both a query language and a parallel data processing language.
</span>
 
<span>
[[File:Data-Centric Figure5.jpg|thumb|right|Figure 5: ECL code example for NLP]]
[[File:Data-Centric Figure4.jpg|thumb|left|Figure 4: ECL code example]]
ECL includes built-in data transform operations which process through entire datasets including PROJECT, ITERATE, ROLLUP, JOIN, COMBINE, FETCH, NORMALIZE, DENORMALIZE, and PROCESS. For example, the transform function defined for a JOIN operation receives two records, one from each dataset being joined, and can perform any operations on the fields in the pair of records, and returns an output record which can be completely different from either of the input records. Example syntax for the JOIN operation from the ECL Language Reference Manual is shown in Figure 3. Figure 4 shows an example of the equivalent ECL code for the Pig example program shown in Figure 1.
 
 
The ECL programming language also provides built-in primitives for natural[[Natural language processing]] (NLP) with PATTERN statements and the built-in PARSE operation. PATTERN statements allow matching patterns including regular expressions to be defined and used to parse information from unstructured data such as raw text. PATTERN statements can be combined to implement complex parsing operations or complete grammars from [[Backus-Naur formForm]] (BNF) definitions. The PARSE operation operates across a dataset of records on a specific field within a record, this field could be an entire line in a text file for example. Using this capability of the ECL language is possible to implement parallel processing form information extraction applications across document files and all types of unstructured and semi-structured data including XML-based documents or Web pages. Figure 5 shows an example of ECL code used in a log analysis application which incorporates NLP.</span>{{-}}
 
 
== See Also ==