Revision as of 20:49, 24 November 2013 edit W Nowicki (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 28,958 edits reduce overlinking and over-use of Capital Letters ← Previous edit		Revision as of 20:54, 24 November 2013 edit undo W Nowicki (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 28,958 edits the NSF program was 2009-2010 so use past tense and fill out citation Next edit →
Line 12: == Characteristics == The [[National Science Foundation]] ~~believes~~(NSF) funded a research program ~~that~~from ~~data~~2009 through 2010.<ref>{{Cite web \|title= Data-intensive ~~computing~~Computing ~~requires~~\|work= aProgram ~~“fundamentally~~description ~~different~~\|year= ~~set~~2009 of\|publisher= ~~principles”~~NSF ~~than~~\|url= ~~current computing approaches.<ref>[~~http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS \|accessdate= ~~Data-Intensive~~November ~~Computing]~~2013 ~~by NSF. "Data-Intensive Computing," 2009.~~}}</ref> Through a funding program within the Computer and Information Science and Engineering area, the NSF is seeking to “increase understanding of the capabilities and limitations of data-intensive computing.” The key areasAreas of focus ~~are~~were: * Approaches to [[parallel programming]] to address the [[parallel processing]] of data on data-intensive systems Line 19: * Identifying applications that can exploit this computing paradigm and determining how it should evolve to support emerging data-intensive applications [[Pacific Northwest National Labs]] defined data-intensive computing as “capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies”.<ref>[http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Computing] by PNNL. "Data Intensive Computing," 2008</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2009.26 The Changing Paradigm of Data-Intensive Computing] by R.T. Kouzes, G.A. Anderson, S.T. Elbert, I. Gorton, and D.K. Gracio, "The Changing Paradigm of Data-Intensive Computing," Computer, Vol. 42, No. 1, 2009, pp. 26-3</ref> They believe that to address the rapidly growing data volumes and complexity ~~requires~~required “epochal advances in software, hardware, and algorithm development” which can scale readily with size of the data and provide effective and timely analysis and processing results. == ~~Processing~~ Approach == Data-intensive computing platforms typically use a [[parallel computing]] approach combining multiple processors and disks in large commodity [[Cluster (computing)\|computing clusters]] connected using high-speed communications switches and networks which allows the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A cluster can be defined as a type of parallel and [[distributed system]], which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource.<ref>[http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4V47C7R-1&_user=10&_coverDate=06%2F30%2F2009&_rdoc=1&_fmt=high&_orig=gateway&_origin=gateway&_sort=d&_docanchor=&view=c&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=824e4c2635a53c6fe068f3f2d11df096&searchtype=a Cloud computing and emerging IT platforms] by R. Buyya, C.S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, "Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," Future Generation Computer Systems, Vol. 25, No. 6, 2009, pp. 599-616</ref> This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In [[parallel computing]] this approach is considered suitable for data-intensive computing and problems which are “embarrassingly parallel”, i.e. where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of [[distributed computing]] including clusters, data grids, and [[cloud computing]].

Data-intensive computing: Difference between revisions