Data-intensive computing: Difference between revisions

Content deleted Content added
Link suggestions feature: 3 links added.
Tags: Visual edit Mobile edit Mobile web edit Newcomer task Suggested: add links
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
Line 10:
 
== Data-parallelism ==
Computer system architectures which can support [[data parallel]] applications were promoted in the early 2000s for large-scale data processing requirements of data-intensive computing.<ref>[http://www.patrickpantel.com/download/papers/2004/kdd-msw04-1.pdf The terascale challenge] by D. Ravichandran, P. Pantel, and E. Hovy. "The terascale challenge," Proceedings of the KDD Workshop on Mining for and from the Semantic Web, 2004</ref> Data-parallelism applied computation independently to each data item of a set of data, which allows the degree of parallelism to be scaled with the volume of data. The most important reason for developing data-parallel applications is the potential for scalable performance, and may result in several orders of magnitude performance improvement. The key issues with developing applications using data-parallelism are the choice of the algorithm, the strategy for data decomposition, [[load balancing (computing)|load balancing]] on processing nodes, [[message passing]] communications between nodes, and the overall accuracy of the results.<ref>[http://www.cs.rochester.edu/u/umit/papers/ppopp01.ps Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations] {{Webarchive|url=https://web.archive.org/web/20110720035435/http://www.cs.rochester.edu/u/umit/papers/ppopp01.ps |date=2011-07-20 }} by U. Rencuzogullari, and [[Sandhya Dwarkadas|S. Dwarkadas]]. "Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations," Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 2001</ref> The development of a data parallel application can involve substantial programming complexity to define the problem in the context of available programming tools, and to address limitations of the target architecture. [[Information extraction]] from and indexing of Web documents is typical of data-intensive computing which can derive significant performance benefits from data parallel implementations since Web and other types of document collections can typically then be processed in parallel.<ref>[http://www.mathcs.emory.edu/~eugene/publications.html Information Extraction to Large Document Collections] {{Webarchive|url=https://web.archive.org/web/20110415003825/http://www.mathcs.emory.edu/~eugene/publications.html |date=2011-04-15 }} by E. Agichtein, "Scaling Information Extraction to Large Document Collections," Microsoft Research, 2004</ref>
 
The US [[National Science Foundation]] (NSF) funded a research program from 2009 through 2010.<ref>{{Cite web |title= Data-intensive Computing |work= Program description |year= 2009 |publisher= NSF |url= https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS |accessdate=24 April 2017 }}</ref> Areas of focus were: