Content deleted Content added
Update ecl for big data definition correctly |
No edit summary |
||
Line 20:
}}
'''ECL''' is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.<ref>[http://www.lexisnexis.com/risk/about/guides/program-guide.html A Guide to ECL, [[Lexis-Nexis]].]</ref>
== History ==
ECL was initially designed and developed in 2000 as an in-house productivity tool within Seisint Inc and was considered to be ‘secret weapon’ that allowed [[Seisint]] to gain market share in its data business. The technology was cited as a driving force behind the acquisition of Seisint by [[LexisNexis]] and then again as a major source of synergies when LexisNexis acquired ChoicePoint Inc.
== Implementations ==
LexisNexis, and its partners, were chosen for a [[DARPA]] project to create a prototype of a new kind of data-centric supercomputer. DARPA is the research and development office for the [[U.S. Department of Defense (DoD)]]. DARPA’s mission is to maintain technological superiority of the U.S. military and prevent technological surprise from harming our [[national security]].<ref>[http://www.darpa.mil/about.html [[DARPA]].]</ref>
DARPA has created a Ubiquitous High Performance Computing (UHPC) program to provide the revolutionary technology needed to meet the steadily increasing demands of DoD applications – from embedded to command center and expandable to high performance computing systems.<ref>[http://www.er.doe.gov/ascr/Research/CS/UHPC%20DARPA-SN-09-46_RFI.pdfl [[DARPA High Performance Computing]].]</ref>
Earlier, [[Sandia National Labs]] and LexisNexis were chosen by DARPA as one of four teams to design a new kind of data-centric supercomputer prototype.<ref>[https://share.sandia.gov/news/resources/news_releases/supercomputer-prototype/ttp://insidehpc.com/2010/08/19/uhpc-the-sandia-team/ Sandia team and [[High Performance Computing]].]</ref>
== Language Constructs ==
ECL, at least in its purest form, is a declarative, data centric language. Programs, in the strictest sense, do not exist. Rather an ECL application will specify a number of core datasets (or data values) and then the operations which are to be performed on those values.
=== Hello world ===
ECL is to have succinct solutions to problems and sensible defaults. The ‘Hello World’ program is characteristically short:
‘Hello World’
Perhaps a more flavorful example would take a list of strings, sort them into order, and then return that as a result instead.
// First declare a dataset with one column containing a list of strings
// Datasets can also be binary, csv, xml or externally defined structures
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
SD := SORT(D,Value);
output(SD)
The statements containing a := are defined in ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”
OUTPUT(SD)
Line 115 ⟶ 57:
What is a D?
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
D is a dataset with one column labeled ‘Value’ and containing the following list of data.▼
▲D is a dataset with one column labeled ‘Value’ and containing the following list of
=== ECL Primitives ===
ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOOSEN, ENTH, TOPN, DISTRIBUTE
=== ECL Encapsulation ===
Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent to 120 lines of C++ it still has significant support for large scale programming including data encapsulation and code re-use. The constructs available include: MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHARED
=== Support for Parallelism in ECL ===
In the HPCC implementation, by default, most ECL constructs will execute in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.
=== Comparison to Map-Reduce ===
The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to ECL primitives as follows:
{{clear}}
{| class="wikitable sortable" style="font-size: smaller; text-align: center; width: auto;"
|-
! Hadoop Name/Term
Line 160 ⟶ 82:
! MAPing within the MAPper
! PROJECT/TRANSFORM
! Takes a record and coverts to a different format; in the [[Hadoop]] case the conversion is into a key-value pair
|-
! SHUFFLE (Phase 1)
Line 178 ⟶ 98:
== References ==
<!--- See [[Wikipedia:Footnotes]] on how to create references using <ref></ref> tags which will then appear here automatically -->
{{Reflist}}
== External links ==
* [http://www.nytimes.com/2008/02/21/technology/21iht-reed.4.10279549.html Reed Elsevier to acquire ChoicePoint for $3.6 billion]
* [http://www.bloomberg.com/apps/news?pid=newsarchive&sid=aBuqYZDOSPL4&refer=uk
Reed Elsevier's LexisNexis Buys Seisint for $775 Mln]
* [http://www.reuters.com/finance/stocks/keyDevelopments?symbol=ENL&pn=15 Reed Elsevier]
|