Content deleted Content added
the NSF program was 2009-2010 so use past tense and fill out citation |
remove more over-linking and uncited promotion |
||
Line 2:
== Introduction ==
The rapid growth of the [[Internet]] and [[World Wide Web]] led to vast amounts of information available online. In addition, business and government organizations create large amounts of both structured and [[unstructured information]] which needs to be processed, analyzed, and linked. [[Vinton Cerf]]
[[Parallel processing]] approaches can be generally classified as either ''compute-intensive'', or ''data-intensive''.<ref>[http://portal.acm.org/citation.cfm?id=280278 Models and languages for parallel computation], by D.B. Skillicorn, and D. Talia, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 123-169.</ref><ref>[http://www.pnl.gov/science/images/highlights/computing/dic_special.pdfData-Intensive Computing in the 21st Century], by I. Gorton, P. Greenfield, A. Szalay, and R. Williams, IEEE Computer, Vol. 41, No. 4, 2008, pp. 30-32.</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2008.122 High-Speed, Wide Area, Data Intensive Computing: A Ten Year Retrospective], by W.E. Johnston, IEEE Computer Society, 1998.</ref> Compute-intensive is used to describe application programs that are compute bound. Such applications devote most of their execution time to computational requirements as opposed to I/O, and typically require small volumes of data.
Data-intensive is used to describe applications that are I/O bound or with a need to process large volumes of data.<ref>[https://computation.llnl.gov/casc/dcca-pub/dcca/Papers_files/data-intensive-ieee-computer-0408.pdf IEEE: Hardware Technologies for High-Performance Data-Intensive Computing], by M. Gokhale, J. Cohen, A. Yoo, and W.M. Miller, IEEE Computer, Vol. 41, No. 4, 2008, pp. 60-68.</ref> Such applications devote most of their processing time to I/O and movement and manipulation of data. [[Parallel processing]] of data-intensive applications typically involves partitioning or subdividing the data into multiple segments which can be processed independently using the same executable application program in parallel on an appropriate computing platform, then reassembling the results to produce the completed output data.<ref>[http://www.agoldberg.org/Publications/DesignMethForDP.pdf IEEE: A Design Methodology for Data-Parallel Applications], by L.S. Nyland, J.F. Prins, A. Goldberg, and P.H. Mills, IEEE Transactions on Software Engineering, Vol. 26, No. 4, 2000, pp. 293-314.</ref> The greater the aggregate distribution of the data, the more benefit there is in parallel processing of the data. Data-intensive processing requirements normally scale linearly according to the size of the data and are very amenable to straightforward parallelization. The fundamental challenges for data-intensive computing are managing and processing exponentially growing data volumes, significantly reducing associated data analysis cycles to support practical, timely applications, and developing new algorithms which can scale to search and process massive amounts of data. Researchers coined the term BORPS for "billions of records per second" to measure record processing speed in a way analogous to how the term [[Million instructions per second|MIPS]] applies to describe computers' processing speed.<ref>[http://www.cse.fau.edu/~borko/HandbookofCloudComputing.html/ Handbook of Cloud Computing], "Data-Intensive Technologies for Cloud Computing," by A.M. Middleton. Handbook of Cloud Computing. Springer, 2010, pp. 83-86.</ref>
== Data-parallelism ==
Computer system architectures which can support [[data parallel]] applications
The US [[National Science Foundation]] (NSF) funded a research program from 2009 through 2010.<ref>{{Cite web |title= Data-intensive Computing |work= Program description |year= 2009 |publisher= NSF |url= http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS |accessdate= November 2013 }}</ref> Areas of focus were:▼
▲The [[National Science Foundation]] (NSF) funded a research program from 2009 through 2010.<ref>{{Cite web |title= Data-intensive Computing |work= Program description |year= 2009 |publisher= NSF |url= http://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS |accessdate= November 2013 }}</ref> Areas of focus were:
* Approaches to [[parallel programming]] to address the [[parallel processing]] of data on data-intensive systems
Line 19 ⟶ 18:
* Identifying applications that can exploit this computing paradigm and determining how it should evolve to support emerging data-intensive applications
[[Pacific Northwest National Labs]] defined data-intensive computing as “capturing, managing, analyzing, and understanding data at volumes and rates that push the frontiers of current technologies”.<ref>[http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Computing] by PNNL. "Data Intensive Computing," 2008</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2009.26 The Changing Paradigm of Data-Intensive Computing] by R.T. Kouzes, G.A. Anderson, S.T. Elbert, I. Gorton, and D.K. Gracio, "The Changing Paradigm of Data-Intensive Computing," Computer, Vol. 42, No. 1, 2009, pp. 26-3</ref>
== Approach ==
|