Content deleted Content added
reduce overlinking and over-use of Capital Letters |
the NSF program was 2009-2010 so use past tense and fill out citation |
||
Line 12:
== Characteristics ==
The [[National Science Foundation]]
* Approaches to [[parallel programming]] to address the [[parallel processing]] of data on data-intensive systems
Line 19:
* Identifying applications that can exploit this computing paradigm and determining how it should evolve to support emerging data-intensive applications
[[Pacific Northwest National Labs]] defined data-intensive computing as “capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies”.<ref>[http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Computing] by PNNL. "Data Intensive Computing," 2008</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2009.26 The Changing Paradigm of Data-Intensive Computing] by R.T. Kouzes, G.A. Anderson, S.T. Elbert, I. Gorton, and D.K. Gracio, "The Changing Paradigm of Data-Intensive Computing," Computer, Vol. 42, No. 1, 2009, pp. 26-3</ref> They believe that to address the rapidly growing data volumes and complexity
==
Data-intensive computing platforms typically use a [[parallel computing]] approach combining multiple processors and disks in large commodity [[Cluster (computing)|computing clusters]] connected using high-speed communications switches and networks which allows the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A cluster can be defined as a type of parallel and [[distributed system]], which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource.<ref>[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4V47C7R-1&_user=10&_coverDate=06%2F30%2F2009&_rdoc=1&_fmt=high&_orig=gateway&_origin=gateway&_sort=d&_docanchor=&view=c&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=824e4c2635a53c6fe068f3f2d11df096&searchtype=a Cloud computing and emerging IT platforms] by R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," Future Generation Computer Systems, Vol. 25, No. 6, 2009, pp. 599-616</ref> This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In [[parallel computing]] this approach is considered suitable for data-intensive computing and problems which are “embarrassingly parallel”, i.e. where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of [[distributed computing]] including clusters, data grids, and [[cloud computing]].
|