ECL (data-centric programming language): Difference between revisions

Content deleted Content added
Jratke (talk | contribs)
Update ecl for big data definition correctly
 
Jratke (talk | contribs)
No edit summary
Line 20:
}}
 
'''ECL''' is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.<ref>[http://www.lexisnexis.com/risk/about/guides/program-guide.html A Guide to ECL, [[Lexis-Nexis]].]</ref>
 
allow a team of programmers to process Big Data across a high performance computing
 
cluster without the programmer being involved in many of the lower level, imperative
 
decisions.<ref>[http://www.lexisnexis.com/risk/about/guides/program-guide.html A
 
Guide to ECL, [[Lexis-Nexis]].]</ref>
== History ==
ECL was initially designed and developed in 2000 as an in-house productivity tool within Seisint Inc and was considered to be ‘secret weapon’ that allowed [[Seisint]] to gain market share in its data business. The technology was cited as a driving force behind the acquisition of Seisint by [[LexisNexis]] and then again as a major source of synergies when LexisNexis acquired ChoicePoint Inc.
 
within Seisint Inc and was considered to be ‘secret weapon’ that allowed [[Seisint]]
 
to gain market share in its data business. The technology was cited as a driving
 
force behind the acquisition of Seisint by [[LexisNexis]] and then again as a major
 
source of synergies when LexisNexis acquired ChoicePoint Inc.
 
== Implementations ==
LexisNexis, and its partners, were chosen for a [[DARPA]] project to create a prototype of a new kind of data-centric supercomputer. DARPA is the research and development office for the [[U.S. Department of Defense (DoD)]]. DARPA’s mission is to maintain technological superiority of the U.S. military and prevent technological surprise from harming our [[national security]].<ref>[http://www.darpa.mil/about.html [[DARPA]].]</ref>
LexisNexis, and its partners, were chosen for a [[DARPA]] project to create a
 
DARPA has created a Ubiquitous High Performance Computing (UHPC) program to provide the revolutionary technology needed to meet the steadily increasing demands of DoD applications – from embedded to command center and expandable to high performance computing systems.<ref>[http://www.er.doe.gov/ascr/Research/CS/UHPC%20DARPA-SN-09-46_RFI.pdfl [[DARPA High Performance Computing]].]</ref>
prototype of a new kind of data-centric supercomputer. DARPA is the research and
 
Earlier, [[Sandia National Labs]] and LexisNexis were chosen by DARPA as one of four teams to design a new kind of data-centric supercomputer prototype.<ref>[https://share.sandia.gov/news/resources/news_releases/supercomputer-prototype/ttp://insidehpc.com/2010/08/19/uhpc-the-sandia-team/ Sandia team and [[High Performance Computing]].]</ref>
development office for the [[U.S. Department of Defense (DoD)]]. DARPA’s mission is
 
to maintain technological superiority of the U.S. military and prevent technological
 
surprise from harming our [[national security]].<ref>
 
[http://www.darpa.mil/about.html [[DARPA]].]</ref>
 
DARPA has created a Ubiquitous High Performance Computing (UHPC) program to provide
 
the revolutionary technology needed to meet the steadily increasing demands of DoD
 
applications – from embedded to command center and expandable to high performance
 
computing systems.<ref>[http://www.er.doe.gov/ascr/Research/CS/UHPC%20DARPA-SN-09-
 
46_RFI.pdfl [[DARPA High Performance Computing]].]</ref>
 
Earlier, [[Sandia National Labs]] and LexisNexis were chosen by DARPA as one of four
 
teams to design a new kind of data-centric supercomputer prototype.<ref>
 
[https://share.sandia.gov/news/resources/news_releases/supercomputer-
 
prototype/ttp://insidehpc.com/2010/08/19/uhpc-the-sandia-team/ Sandia team and
 
[[High Performance Computing]].]</ref>
 
== Language Constructs ==
ECL, at least in its purest form, is a declarative, data centric language. Programs, in the strictest sense, do not exist. Rather an ECL application will specify a number of core datasets (or data values) and then the operations which are to be performed on those values.
 
in the strictest sense, do not exist. Rather an ECL application will specify a
 
number of core datasets (or data values) and then the operations which are to be
 
performed on those values.
 
=== Hello world ===
ECL is to have succinct solutions to problems and sensible defaults. The ‘Hello World’ program is characteristically short:
 
World’ program is characteristically short:
‘Hello World’
Perhaps a more flavorful example would take a list of strings, sort them into order, and then return that as a result instead.
 
and then return that as a result instead.
// First declare a dataset with one column containing a list of strings
// Datasets can also be binary, csv, xml or externally defined structures
 
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
 
{'Language'}],{STRING Value;});
SD := SORT(D,Value);
output(SD)
 
The statements containing a := are defined in ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”
 
not denote an action; rather a definition of a term. Thus, logically, an ECL program
 
can be read: “bottom to top”
 
OUTPUT(SD)
Line 115 ⟶ 57:
What is a D?
 
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
 
D is a dataset with one column labeled ‘Value’ and containing the following list of data.
{'Language'}],{STRING Value;});
 
D is a dataset with one column labeled ‘Value’ and containing the following list of
 
data.
 
=== ECL Primitives ===
ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOOSEN, ENTH, TOPN, DISTRIBUTE
 
PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOOSEN, ENTH, TOPN, DISTRIBUTE
 
=== ECL Encapsulation ===
Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent to 120 lines of C++ it still has significant support for large scale programming including data encapsulation and code re-use. The constructs available include: MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHARED
 
to 120 lines of C++ it still has significant support for large scale programming
 
including data encapsulation and code re-use. The constructs available include:
 
MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHARED
 
=== Support for Parallelism in ECL ===
In the HPCC implementation, by default, most ECL constructs will execute in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.
 
across the hardware being used. Many of the primitives also have a LOCAL option to
 
specify that the operation is to occur locally on each node.
 
=== Comparison to Map-Reduce ===
The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to ECL primitives as follows:
 
ECL primitives as follows:
 
{{clear}}
{| class="wikitable sortable" style="font-size: smaller; text-align: center; width: auto;"
 
auto;"
|-
! Hadoop Name/Term
Line 160 ⟶ 82:
! MAPing within the MAPper
! PROJECT/TRANSFORM
! Takes a record and coverts to a different format; in the [[Hadoop]] case the conversion is into a key-value pair
 
conversion is into a key-value pair
|-
! SHUFFLE (Phase 1)
Line 178 ⟶ 98:
 
== References ==
<!--- See [[Wikipedia:Footnotes]] on how to create references using <ref></ref> tags which will then appear here automatically -->
 
which will then appear here automatically -->
{{Reflist}}
 
== External links ==
* [http://www.nytimes.com/2008/02/21/technology/21iht-reed.4.10279549.html Reed Elsevier to acquire ChoicePoint for $3.6 billion]
 
Elsevier to acquire ChoicePoint for $3.6 billion]
* [http://www.bloomberg.com/apps/news?pid=newsarchive&sid=aBuqYZDOSPL4&refer=uk
Reed Elsevier's LexisNexis Buys Seisint for $775 Mln]
* [http://www.reuters.com/finance/stocks/keyDevelopments?symbol=ENL&pn=15 Reed Elsevier]
 
Elsevier]