Data-intensive computing: Difference between revisions

Content deleted Content added
reduce overlinking and over-use of Capital Letters
the NSF program was 2009-2010 so use past tense and fill out citation
Line 12:
 
== Characteristics ==
The [[National Science Foundation]] believes(NSF) funded a research program thatfrom data2009 through 2010.<ref>{{Cite web |title= Data-intensive computingComputing requires|work= aProgram “fundamentallydescription different|year= set2009 of|publisher= principles”NSF than|url= current computing approaches.<ref>[http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS |accessdate= Data-IntensiveNovember Computing]2013 by NSF. "Data-Intensive Computing," 2009.}}</ref> Through a funding program within the Computer and Information Science and Engineering area, the NSF is seeking to “increase understanding of the capabilities and limitations of data-intensive computing.” The key areasAreas of focus arewere:
 
* Approaches to [[parallel programming]] to address the [[parallel processing]] of data on data-intensive systems
Line 19:
* Identifying applications that can exploit this computing paradigm and determining how it should evolve to support emerging data-intensive applications
 
[[Pacific Northwest National Labs]] defined data-intensive computing as “capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies”.<ref>[http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Computing] by PNNL. "Data Intensive Computing," 2008</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2009.26 The Changing Paradigm of Data-Intensive Computing] by R.T. Kouzes, G.A. Anderson, S.T. Elbert, I. Gorton, and D.K. Gracio, "The Changing Paradigm of Data-Intensive Computing," Computer, Vol. 42, No. 1, 2009, pp. 26-3</ref> They believe that to address the rapidly growing data volumes and complexity requiresrequired “epochal advances in software, hardware, and algorithm development” which can scale readily with size of the data and provide effective and timely analysis and processing results.
 
== Processing Approach ==
Data-intensive computing platforms typically use a [[parallel computing]] approach combining multiple processors and disks in large commodity [[Cluster (computing)|computing clusters]] connected using high-speed communications switches and networks which allows the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A cluster can be defined as a type of parallel and [[distributed system]], which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource.<ref>[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4V47C7R-1&_user=10&_coverDate=06%2F30%2F2009&_rdoc=1&_fmt=high&_orig=gateway&_origin=gateway&_sort=d&_docanchor=&view=c&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=824e4c2635a53c6fe068f3f2d11df096&searchtype=a Cloud computing and emerging IT platforms] by R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," Future Generation Computer Systems, Vol. 25, No. 6, 2009, pp. 599-616</ref> This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In [[parallel computing]] this approach is considered suitable for data-intensive computing and problems which are “embarrassingly parallel”, i.e. where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of [[distributed computing]] including clusters, data grids, and [[cloud computing]].