Computer cluster: Difference between revisions

Content deleted Content added
No edit summary
Tags: Reverted Visual edit Mobile edit Mobile web edit
add link to beowulf clusters
 
(8 intermediate revisions by 5 users not shown)
Line 1:
{{Short description|Set of computers configured in a distributed computing system}}
 
{{Distinguish|data cluster|grid computing}}
{{Redirect|Cluster computing|the journal|Cluster Computing (journal)}}
[[File:MEGWARE.CLIC.jpg|thumb|Technicians working on a large [[Linux]] cluster at the [[Chemnitz University of Technology]], Germany]]
[[File:Sun Microsystems Solaris computer cluster.jpg|thumb|Sun Microsystems [[Solaris Cluster]], with [[Close Coupled Cooling#In-Row Air Conditioners|gIn-Row cooling]]]]
[[File:Taiwania series.jpg|thumb|[[Taiwania_(supercomputer)|Taiwania]] series uses cluster architecture.]]
 
A '''computer cluster''' is a set of [[computer]]s that work together so that they can be viewed as a single system. Unlike [[Grid computing|grid computer]]s, computer clusters have each [[Node (networking)|node]] set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is [[cloud computing]].
 
The components of a cluster are usually connected to each other through fast [[local area network]]s, with each [[Node (networking)|node]] (computer used as a server) running its own instance of an [[operating system]]. In most circumstances, all of the nodes use the same hardware<ref>{{cite web |url=https://stackoverflow.com/questions/9723040/what-is-the-difference-between-cloud-grid-and-cluster |title=Cluster vs grid computing |website=[[Stack Overflow]]}}</ref>{{better source needed|date=June 2017}} and the same operating system, although in some setups (e.g. using [[Open Source Cluster Application Resources]] (OSCAR)), different operating systems can be used on each computer, or different hardware.<ref name=pcauthority>{{cite web|url=http://www.pcauthority.com.au/Feature/306972,weekend-project-build-your-own-supercomputer.aspx|title=Weekend Project: Build your own supercomputer|date=29 June 2012|first=Darien|last=Graham-Smith|website=PC & Tech Authority|access-date=2 June 2017}}</ref>
 
Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.<ref>{{cite web|url=http://www.cc.gatech.edu/~bader/papers/ijhpca.html|title=Cluster Computing: Applications|last1=Bader|first1=David|author-link=David Bader (computer scientist)|date=May 2001|publisher=[[Georgia Institute of Technology College of Computing|Georgia Tech College of Computing]]|first2=Robert|last2=Pennington|access-date=2017-02-28|archive-url=https://web.archive.org/web/20071221011621/http://www.cc.gatech.edu/~bader/papers/ijhpca.html|archive-date=2007-12-21|url-status=dead}}</ref>
 
Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high-performance [[distributed computing]].{{citation needed|date=October 2014}} They have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest [[supercomputer]]s in the world such as [[IBM Sequoia|IBM's Sequoia]].<ref>{{cite web|url=https://www.telegraph.co.uk/technology/9338651/Nuclear-weapons-supercomputer-reclaims-world-speed-record-for-US.html |archive-url=https://ghostarchive.org/archive/20220112/https://www.telegraph.co.uk/technology/9338651/Nuclear-weapons-supercomputer-reclaims-world-speed-record-for-US.html |archive-date=2022-01-12 |url-access=subscription |url-status=live|title=Nuclear weapons supercomputer reclaims world speed record for US|publisher=The Telegraph|date=18 Jun 2012|access-date=18 Jun 2012}}{{cbignore}}</ref> Prior to the advent of clusters, single-unit [[fault tolerant]] [[mainframes]] with [[Triple modular redundancy|modular redundancy]] were employed; but the lower upfront cost of clusters, and increased speed of network fabric has favoured the adoption of clusters. In contrast to high-reliability mainframes, clusters are cheaper to scale out, but also have increased complexity in error handling, as in clusters error modes are not opaque to running programs.<ref>{{cite book |last1=Gray |first1=Jim |last2=Rueter |first2=Andreas |title=Transaction processing : concepts and techniques |url=https://archive.org/details/transactionproce0000gray |url-access=registration |date=1993 |publisher=Morgan Kaufmann Publishers |isbn=978-1558601901}}</ref>
.
 
==Basic concepts==
[[File:Beowulf.jpg|thumb|150px|A simple, home-built [[Beowulf cluster]]]]
The desire to get more computing powebypower and better reliability by orchestrating a numbecomputersnumber of low-cost [[commercial off-the-shelf]] computers has given rise to a variety of architectures and configurations.
 
The computer clustering approach usually (but not always) connects a number of readily available computing nodes (e.g. personal computers used as servers) via a fast [[local area network]].<ref name=nbis>{{cite conference|title=Network-Based Information Systems: First International Conference, NBIS 2007|isbn=978-3-540-74572-3|page=375|last1=Enokido|first1=Tomoya|last2=Barolli|first2=Leonhard|last3=Takizawa|first3=Makoto|date=23 August 2007}}</ref> The activities of the computing nodes are orchestrated by "clustering middleware", a software layer that sits atop the nodes and allows the users to treat the cluster as by and large one cohesive computing unit, e.g. via a [[single system image]] concept.<ref name=nbis />
Line 47 ⟶ 54:
==Benefits==
<!-- This used to be a list. Work has been done since, but it's still incomplete. -->
Clusters are primarily designed with performance in mind, but installations are based on many other factors. Fault tolerance (''the ability forof a system to continue workingoperating withdespite a malfunctioning node'') allows forenables [[horizontal scaling|scalability]], and in high-performance situations, allows for a low frequency of maintenance routines, resource consolidation (e.g., [[RAID]]), and centralized management. Advantages include enabling data recovery in the event of a disaster and providing parallel data processing and high processing capacity.<ref>{{cite web|url=http://www-03.ibm.com/systems/clusters/benefits.html|title=IBM Cluster System : Benefits|publisher=[[IBM]]|access-date=8 September 2014|archive-url=https://web.archive.org/web/20160429022854/http://www-03.ibm.com/systems/clusters/benefits.html|archive-date=29 April 2016|url-status=dead}}</ref><ref>{{cite web|url=https://technet.microsoft.com/en-us/library/cc778629(v=ws.10).aspx|title=Evaluating the Benefits of Clustering|date=28 March 2003|publisher=[[Microsoft]]|access-date=8 September 2014|archive-url=https://web.archive.org/web/20160422092651/https://technet.microsoft.com/en-us/library/cc778629%28v%3Dws.10%29.aspx|archive-date=22 April 2016|url-status=dead}}</ref>
 
In terms of scalability, clusters provide this in their ability to add nodes horizontally. This means that more computers may be added to the cluster, to improve its performance, redundancy and fault tolerance. This can be an inexpensive solution for a higher performing cluster compared to scaling up a single node in the cluster. This property of computer clusters can allow for larger computational loads to be executed by a larger number of lower performing computers.
Line 63 ⟶ 70:
A special purpose 144-node [[DEGIMA (computer cluster)|DEGIMA cluster]] is tuned to running astrophysical N-body simulations using the Multiple-Walk parallel tree code, rather than general purpose scientific computations.<ref name=Hamada>{{cite journal|first=Tsuyoshi|last=Hamada |display-authors=etal |year=2009|title=A novel multiple-walk parallel algorithm for the Barnes–Hut treecode on GPUs – towards cost effective, high performance N-body simulation|journal=Computer Science – Research and Development|volume=24|issue=1–2 |pages=21–31 |doi=10.1007/s00450-009-0089-1|s2cid=31071570 }}</ref>
 
Due to the increasing computing power of each generation of [[game console]]s, a novel use has emerged where they are repurposed into [[High-performance computing]] (HPC) clusters. Some examples of game console clusters are [[PlayStation 3 cluster|Sony PlayStation clusters]] and [[Microsoft]] [[Xbox (console)|Xbox]] clusters. Another example of consumer game product is the [[Nvidia Tesla Personal Supercomputer]] workstation, which uses multiple graphics accelerator processor chips. Besides game consoles, high-end graphics cards too can be used instead. The use of graphics cards (or rather their GPU's) to do calculations for grid computing is vastly more economical than using CPU's, despite being less precise. However, when using double-precision values, they become as precise to work with as CPU's and are still much less costly (purchase cost).<ref name="pcauthority">{{cite web |last=Graham-Smith |first=Darien |date=29 June 2012 |title=Weekend Project: Build your own supercomputer |url=http://www.pcauthority.com.au/Feature/306972,weekend-project-build-your-own-supercomputer.aspx |access-date=2 June 2017 |website=PC & Tech Authority}}</ref>
 
Computer clusters have historically run on separate physical [[computer]]s with the same [[operating system]]. With the advent of [[virtualization]], the cluster nodes may run on separate physical computers with different operating systems which are painted above with a virtual layer to look similar.<ref name=linuxjournal>{{cite web|url=http://www.linuxjournal.com/article/8812|title=Xen Virtualization and Linux Clustering, Part 1|date=12 Jan 2006|website=Linux Journal|first=Ryan|last=Mauer|access-date=2 Jun 2017}}</ref>{{citation needed|date=November 2013}}{{clarify|date=November 2013}} The cluster may also be virtualized on various configurations as maintenance takes place; an example implementation is [[Xen]] as the virtualization manager with [[Linux-HA]].<ref name="linuxjournal" />
Line 152 ⟶ 159:
:* [[Solaris Cluster]]
:* [[Veritas Cluster Server]]
:* [[Beowulf cluster]]
 
''Computer farms''
Line 172 ⟶ 180:
{{Commons category|Clusters (computing)}}
* [https://web.archive.org/web/20190219183441/https://www.ieeetcsc.org/ IEEE Technical Committee on Scalable Computing (TCSC)]
* [https://archive.today/20130103192843/http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=%2Fcom/com.ibm.cluster.rsct.doc%2Frsctbooks/rsctbooks.html Reliable Scalable Cluster Technology, IBM]{{Dead link|date=July 2020 |bot=InternetArchiveBot |fix-attempted=yes }}
* [https://www.ibm.com/developerworks/wikis/display/tivoli/Tivoli+System+Automation Tivoli System Automation Wiki]
* [https://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/43438.pdf Large-scale cluster management at Google with Borg], April 2015, by Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune and John Wilkes