Revision as of 14:37, 28 September 2015 edit BD2412 (talk \| contribs) Autopatrolled, Administrators 2,527,267 edits m minor fixes, mostly disambig links using AWB ← Previous edit		Revision as of 07:29, 7 December 2016 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,673,714 edits Rescuing 1 sources and tagging 0 as dead. #IABot (v1.2.7.1) Next edit →
Line 2: == Introduction == The rapid growth of the [[Internet]] and [[World Wide Web]] led to vast amounts of information available online. In addition, business and government organizations create large amounts of both structured and [[unstructured information]] which needs to be processed, analyzed, and linked. [[Vinton Cerf]] described this as an “information avalanche” and stated “we must harness the Internet’s energy before the information it has unleashed buries us”.<ref>[http://research.google.com/pubs/author32412.html An Information Avalanche], by Vinton Cerf, IEEE Computer, Vol. 40, No. 1, 2007, pp. 104-105.</ref> An [[International Data Corporation\|IDC]] white paper sponsored by [[EMC Corporation]] estimated the amount of information currently stored in a digital form in 2007 at 281 exabytes and the overall compound growth rate at 57% with information in organizations growing at even a faster rate.<ref>[http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf The Expanding Digital Universe] {{wayback\|url=http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf \|date=20130310000000 }}, by J.F. Gantz, D. Reinsel, C. Chute, W. Schlichting, J. McArthur, S. Minton, J. Xheneti, A. Toncheva, and A. Manfrediz, [[International Data Corporation\|IDC]], White Paper, 2007.</ref> In a 2003 study of the so-called information explosion it was estimated that 95% of all current information exists in unstructured form with increased data processing requirements compared to structured information.<ref>[http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/ How Much Information? 2003], by P. Lyman, and H.R. Varian, University of California at Berkeley, Research Report, 2003.</ref> The storing, managing, accessing, and processing of this vast amount of data represents a fundamental need and an immense challenge in order to satisfy needs to search, analyze, mine, and visualize this data as information.<ref>[http://www.sdsc.edu/about/director/pubs/communications200812-DataDeluge.pdf Got Data? A Guide to Data Preservation in the Information Age], by F. Berman, Communications of the ACM, Vol. 51, No. 12, 2008, pp. 50-56.</ref> Data-intensive computing is intended to address this need. [[Parallel computing\|Parallel processing]] approaches can be generally classified as either ''compute-intensive'', or ''data-intensive''.<ref>[http://portal.acm.org/citation.cfm?id=280278 Models and languages for parallel computation], by D.B. Skillicorn, and D. Talia, ACM Computing Surveys, Vol. 30, No. 2, 1998, pp. 123-169.</ref><ref>[http://www.pnl.gov/science/images/highlights/computing/dic_special.pdfData-Intensive Computing in the 21st Century], by I. Gorton, P. Greenfield, A. Szalay, and R. Williams, IEEE Computer, Vol. 41, No. 4, 2008, pp. 30-32.</ref><ref>[http://www.computer.org/portal/web/csdl/doi/10.1109/MC.2008.122 High-Speed, Wide Area, Data Intensive Computing: A Ten Year Retrospective], by W.E. Johnston, IEEE Computer Society, 1998.</ref> Compute-intensive is used to describe application programs that are compute bound. Such applications devote most of their execution time to computational requirements as opposed to I/O, and typically require small volumes of data. Parallel processing of compute-intensive applications typically involves parallelizing individual algorithms within an application process, and decomposing the overall application process into separate tasks, which can then be executed in parallel on an appropriate computing platform to achieve overall higher performance than serial processing. In compute-intensive applications, multiple operations are performed simultaneously, with each operation addressing a particular part of the problem. This is often referred to as task [[parallel computing\|parallelism]].

Data-intensive computing: Difference between revisions