Content deleted Content added
→See Also: removing red links |
Fvillanustre (talk | contribs) mNo edit summary |
||
Line 55:
[[LexisNexis Risk Solutions]], independently developed and implemented a solution for data-intensive computing called the [[HPCC]] (High-Performance Computing Cluster). The development of this computing platform began in 1999 and applications were in production by late 2000. The [[LexisNexis]] approach also utilizes commodity clusters of hardware running the [[Linux]] operating system as shown in Figure 1. Custom system software and middleware components were developed and layered on the base Linux operating system to provide the execution environment and distributed filesystem support required for data-intensive computing. LexisNexis also implemented a new high-level language for data-intensive computing called ECL.
The [[ECL, data-centric programming language for Big Data|ECL programming language]] is the primary distinguishing factor between HPCC and other data-intensive computing solutions. It is a high-level, declarative, data-centric, [[implicitly parallel]] language that allows the programmer to define what the data processing result should be and the dataflows and transformations that are necessary to achieve the result. The [[ECL]] language includes extensive capabilities for data definition, filtering, data management, and data transformation, and provides an extensive set of built-in functions to operate on records in datasets which can include user-defined transformation functions. [[ECL, data-centric programming language for Big Data|ECL]] programs are compiled into optimized [[C++]] source code, which is subsequently compiled into executable code and distributed to the nodes of a processing cluster.
To address both batch and online aspects data-intensive computing applications, [[HPCC]] includes two distinct cluster environments, each of which can be optimized independently for its parallel data processing purpose. The Thor platform is a cluster whose purpose is to be a data refinery for processing of massive volumes of raw data for applications such as data cleansing and hygiene, [[ETL]] (extract, transform load), record linking and entity resolution, large-scale ad-hoc analysis of data, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications. A Thor system is similar in its hardware configuration, function, execution environment, filesystem, and capabilities to the Hadoop MapReduce platform, but provides higher performance in equivalent configurations. The Roxie platform provides an online high-performance structured query and analysis system or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times. A Roxie system is similar in its function and capabilities to [[Hadoop]] with [[HBase]] and [[Hive]] capabilities added, but provides an optimized execution environment and filesystem for high-performance online processing. Both Thor and Roxie systems utilize the same ECL programming language for implementing applications, increasing programmer productivity.
|