Data-intensive computing: Difference between revisions

Content deleted Content added
reference to an inexistent figure
No edit summary
Line 58:
 
To address both batch and online aspects data-intensive computing applications, [[HPCC]] includes two distinct cluster environments, each of which can be optimized independently for its parallel data processing purpose. The Thor platform is a cluster whose purpose is to be a data refinery for processing of massive volumes of raw data for applications such as data cleansing and hygiene, [[Extract, transform, load|ETL]] (extract, transform load), record linking and entity resolution, large-scale ad-hoc analysis of data, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications. A Thor system is similar in its hardware configuration, function, execution environment, filesystem, and capabilities to the Hadoop MapReduce platform, but provides higher performance in equivalent configurations. The Roxie platform provides an online high-performance structured query and analysis system or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times. A Roxie system is similar in its function and capabilities to [[Hadoop]] with [[HBase]] and [[Apache Hive|Hive]] capabilities added, but provides an optimized execution environment and filesystem for high-performance online processing. Both Thor and Roxie systems utilize the same ECL programming language for implementing applications, increasing programmer productivity.
 
 
==Conferences==
* [http://www.lucentia.es/workshop/modic12/ 1st International Workshop on Modeling for Data-Intensive Computing (MoDIC 2012) ]. To be held in conjunction with the 31st International Conference on Conceptual Modelling, (ER2012).
 
 
 
== See also ==