Revision as of 14:50, 16 July 2025 edit RankASea (talk \| contribs) Extended confirmed users 983 edits Link suggestions feature: 3 links added. Tags: Visual edit Mobile edit Mobile web edit Newcomer task Suggested: add links ← Previous edit		Revision as of 19:57, 17 August 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,673,769 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5 Next edit →
Line 10: == Data-parallelism == Computer system architectures which can support [[data parallel]] applications were promoted in the early 2000s for large-scale data processing requirements of data-intensive computing.<ref>[http://www.patrickpantel.com/download/papers/2004/kdd-msw04-1.pdf The terascale challenge] by D. Ravichandran, P. Pantel, and E. Hovy. "The terascale challenge," Proceedings of the KDD Workshop on Mining for and from the Semantic Web, 2004</ref> Data-parallelism applied computation independently to each data item of a set of data, which allows the degree of parallelism to be scaled with the volume of data. The most important reason for developing data-parallel applications is the potential for scalable performance, and may result in several orders of magnitude performance improvement. The key issues with developing applications using data-parallelism are the choice of the algorithm, the strategy for data decomposition, [[load balancing (computing)\|load balancing]] on processing nodes, [[message passing]] communications between nodes, and the overall accuracy of the results.<ref>[http://www.cs.rochester.edu/u/umit/papers/ppopp01.ps Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations] {{Webarchive\|url=https://web.archive.org/web/20110720035435/http://www.cs.rochester.edu/u/umit/papers/ppopp01.ps \|date=2011-07-20 }} by U. Rencuzogullari, and [[Sandhya Dwarkadas\|S. Dwarkadas]]. "Dynamic adaptation to available resources for parallel computing in an autonomous network of workstations," Proceedings of the Eighth ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming, 2001</ref> The development of a data parallel application can involve substantial programming complexity to define the problem in the context of available programming tools, and to address limitations of the target architecture. [[Information extraction]] from and indexing of Web documents is typical of data-intensive computing which can derive significant performance benefits from data parallel implementations since Web and other types of document collections can typically then be processed in parallel.<ref>[http://www.mathcs.emory.edu/~eugene/publications.html Information Extraction to Large Document Collections] {{Webarchive\|url=https://web.archive.org/web/20110415003825/http://www.mathcs.emory.edu/~eugene/publications.html \|date=2011-04-15 }} by E. Agichtein, "Scaling Information Extraction to Large Document Collections," Microsoft Research, 2004</ref> The US [[National Science Foundation]] (NSF) funded a research program from 2009 through 2010.<ref>{{Cite web \|title= Data-intensive Computing \|work= Program description \|year= 2009 \|publisher= NSF \|url= https://www.nsf.gov/funding/pgm_summ.jsp?pims_id=503324&org=IIS \|accessdate=24 April 2017 }}</ref> Areas of focus were:

Data-intensive computing: Difference between revisions