Data-intensive computing: Difference between revisions

Content deleted Content added
No edit summary
Line 27:
There are several important common characteristics of data-intensive computing systems that distinguish them from other forms of computing:
 
(1) The principle of collocationcollection of the data and programs or algorithms is used to perform the computation. To achieve high performance in data-intensive computing, it is important to minimize the movement of data.<ref>[http://queue.acm.org/detail.cfm?id=1394131 Distributed Computing Economics] by J. Gray, "Distributed Computing Economics," ACM Queue, Vol. 6, No. 3, 2008, pp. 63-68.</ref> This characteristic allows processing algorithms to execute on the nodes where the data resides reducing system overhead and increasing performance.<ref>[http://www.pnl.gov/science/images/highlights/computing/dic_special.pdfData-Intensive Computing in the 21st Century], by I. Gorton, P. Greenfield, A. Szalay, and R. Williams, IEEE Computer, Vol. 41, No. 4, 2008, pp. 30-32.</ref> Newer technologies such as [[InfiniBand]] allow data to be stored in a separate repository and provide performance comparable to collocated data.
 
(2) The programming model utilized. Data-intensive computing systems utilize a machine-independent approach in which applications are expressed in terms of high-level operations on data, and the runtime system transparently controls the scheduling, execution, load balancing, communications, and movement of programs and data across the distributed computing cluster.<ref>[http://www.cs.cmu.edu/~bryant/presentations/DISC-concept.ppt Data Intensive Scalable Computing] by R.E. Bryant. "Data Intensive Scalable Computing," 2008</ref> The programming abstraction and language tools allow the processing to be expressed in terms of data flows and transformations incorporating new dataflow [[programming languages]] and shared libraries of common data manipulation algorithms such as sorting.