ECL (data-centric programming language)

This is an old revision of this page, as edited by Jratke (talk | contribs) at 09:47, 7 March 2011 (Update ecl for big data definition correctly). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)

Template:New unreviewed article


ECL
Paradigmdeclarative structured, data-centric
First appeared2000
Typing disciplinestatic, strong, safe
Websitehpccsystems.com
Dialects
UCSD, Borland, Turbo
Influenced by
Prolog, Pascal, SQL, Snobol-4, C++, Clarion

ECL is a declarative, data centric programming language designed in 2000 to

allow a team of programmers to process Big Data across a high performance computing

cluster without the programmer being involved in many of the lower level, imperative

decisions.[1]

History

ECL was initially designed and developed in 2000 as an in-house productivity tool

within Seisint Inc and was considered to be ‘secret weapon’ that allowed Seisint

to gain market share in its data business. The technology was cited as a driving

force behind the acquisition of Seisint by LexisNexis and then again as a major

source of synergies when LexisNexis acquired ChoicePoint Inc.

Implementations

LexisNexis, and its partners, were chosen for a DARPA project to create a

prototype of a new kind of data-centric supercomputer. DARPA is the research and

development office for the U.S. Department of Defense (DoD). DARPA’s mission is

to maintain technological superiority of the U.S. military and prevent technological

surprise from harming our national security.[2]

DARPA has created a Ubiquitous High Performance Computing (UHPC) program to provide

the revolutionary technology needed to meet the steadily increasing demands of DoD

applications – from embedded to command center and expandable to high performance

computing systems.[3]

Earlier, Sandia National Labs and LexisNexis were chosen by DARPA as one of four

teams to design a new kind of data-centric supercomputer prototype.[4]

Language Constructs

ECL, at least in its purest form, is a declarative, data centric language. Programs,

in the strictest sense, do not exist. Rather an ECL application will specify a

number of core datasets (or data values) and then the operations which are to be

performed on those values.

Hello world

ECL is to have succinct solutions to problems and sensible defaults. The ‘Hello

World’ program is characteristically short: ‘Hello World’ Perhaps a more flavorful example would take a list of strings, sort them into order,

and then return that as a result instead. // First declare a dataset with one column containing a list of strings // Datasets can also be binary, csv, xml or externally defined structures

D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},

{'Language'}],{STRING Value;}); SD := SORT(D,Value); output(SD)

The statements containing a := are defined in ECL as attribute definitions. They do

not denote an action; rather a definition of a term. Thus, logically, an ECL program

can be read: “bottom to top”

OUTPUT(SD)

What is an SD?

SD := SORT(D,Value);

SD is a D that has been sorted by ‘Value’

What is a D?

D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},

{'Language'}],{STRING Value;});

D is a dataset with one column labeled ‘Value’ and containing the following list of

data.

ECL Primitives

ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE,

PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOOSEN, ENTH, TOPN, DISTRIBUTE

ECL Encapsulation

Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent

to 120 lines of C++ it still has significant support for large scale programming

including data encapsulation and code re-use. The constructs available include:

MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHARED

Support for Parallelism in ECL

In the HPCC implementation, by default, most ECL constructs will execute in parallel

across the hardware being used. Many of the primitives also have a LOCAL option to

specify that the operation is to occur locally on each node.

Comparison to Map-Reduce

The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to

ECL primitives as follows:

auto;"

References

Elsevier to acquire ChoicePoint for $3.6 billion]

 Reed Elsevier's LexisNexis Buys Seisint for $775 Mln]

Elsevier]

Hadoop Name/Term ECL equivalent Comments
MAPing within the MAPper PROJECT/TRANSFORM Takes a record and coverts to a different format; in the Hadoop case the

conversion is into a key-value pair

SHUFFLE (Phase 1) DISTRIBUTE(,HASH(KeyValue)) The records from the mapper are distributed dependent upon the KEY value
SHUFFLE (Phase 2) SORT(,LOCAL) The records arriving at a particular reducer are sorted into KEY order
REDUCE ROLLUP(,Key,LOCAL) The records for a particular KEY value are now combined together