Data-intensive computing: Difference between revisions

Content deleted Content added
m System architectures: Reference on programming models and system for data intensive computing
Undid revision 898877161 by Jfabrizio84 (talk) not a venue to present your or your colleagues' newest research, see also WP:COI / WP:SELFCITE
Line 33:
A variety of [[system]] architectures have been implemented for data-intensive computing and large-scale data analysis applications including parallel and distributed [[relational database management systems]] which have been available to run on shared nothing clusters of processing nodes for more than two decades.<ref>[http://www.cse.nd.edu/~dthain/courses/cse40771/spring2010/benchmarks-sigmod09.pdf A Comparison of Approaches to Large-Scale Data Analysis] by A. Pavlo, E. Paulson, A. Rasin, D.J. Abadi, D.J. Dewitt, S. Madden, and M. Stonebraker. Proceedings of the 35th SIGMOD International conference on Management of Data, 2009.</ref>
However most data growth is with data in unstructured form and new processing paradigms with more flexible data models were needed. Several solutions have emerged including the [[MapReduce]] architecture pioneered by Google and now available in an open-source implementation called [[Hadoop]] used by [[Yahoo]], [[Facebook]], and others. [[LexisNexis|LexisNexis Risk Solutions]] also developed and implemented a scalable platform for data-intensive computing which is used by [[LexisNexis]].
 
The paper "Programming Models and Systems for Big Data Analysis"<ref name="prog-models-big-data">{{Cite journal | doi = 10.1080/17445760.2017.1422501 | title = Programming models and systems for Big Data analysis | journal = [[International Journal of Parallel, Emergent and Distributed Systems]]| year = 2018| url = http://dx.doi.org/10.1080/17445760.2017.1422501| last1 = Belcastro | first2 = L. | last2 = Marozzo | first2 = F. | last3 = Talia | first3 = D.}}</ref> takes into account the most popular programming models for data-intensive computing (MapReduce, Directed Acyclic Graph, Message Passing, Bulk Synchronous Parallel, Workflow and SQL-like) and analyses the features of the main systems implementing them. Such systems are compared using four classification criteria (i.e., level of abstraction, type of parallelism, infrastructure scale and classes of applications) for helping developers and users to identify and select the best solution according to their skills, hardware availability, productivity and application needs.
 
===MapReduce===