Data-intensive computing: Difference between revisions

Content deleted Content added
Approach: | Altered template type. Add: pages, issue, volume, journal, date, title, doi, authors 1-5. Changed bare reference to CS1/2. | Use this tool. Report bugs. | #UCB_Gadget
Line 22:
 
== Approach ==
Data-intensive computing platforms typically use a [[parallel computing]] approach combining multiple processors and disks in large commodity [[Cluster (computing)|computing clusters]] connected using high-speed communications switches and networks which allows the data to be partitioned among the available computing resources and processed independently to achieve performance and scalability based on the amount of data. A cluster can be defined as a type of parallel and [[distributed system]], which consists of a collection of inter-connected stand-alone computers working together as a single integrated computing resource.<ref>[https://archive.today/20120918051550/http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V06-4V47C7R-1&_user=10&_coverDate=06/30/2009&_rdoc=1&_fmt=high&_orig=gateway&_origin=gateway&_sort=d&_docanchor=&view=c&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=824e4c2635a53c6fe068f3f2d11df096&searchtype=a{{Cite journal Cloud computing and emerging IT platforms] by R. |last=Buyya, C.S.|first=Rajkumar |last2=Yeo, S.|first2=Chee Shin |last3=Venugopal, J.|first3=Srikumar |last4=Broberg, and|first4=James [[|last5=Brandic |first5=Ivona |author-link5=Ivona Brandić |I.date=2009 Brandic]], "|title=Cloud computing and emerging IT platforms: Vision, hype, and reality for delivering computing as the 5th utility," |url=http://www.sciencedirect.com/science/article/pii/S0167739X08001957 |journal=Future Generation Computer Systems, Vol. |volume=25, No. |issue=6, 2009,|pages=599–616 pp|doi=10. 599-6161016/j.future.2008.12.001}}</ref> This approach to parallel processing is often referred to as a “shared nothing” approach since each node consisting of processor, local memory, and disk resources shares nothing with other nodes in the cluster. In [[parallel computing]] this approach is considered suitable for data-intensive computing and problems which are “embarrassingly parallel”, i.e. where it is relatively easy to separate the problem into a number of parallel tasks and there is no dependency or communication required between the tasks other than overall management of the tasks. These types of data processing problems are inherently adaptable to various forms of [[distributed computing]] including clusters, data grids, and [[cloud computing]].
 
== Characteristics ==