Content deleted Content added
→See also: List of the top supercomputers in the United States |
|||
(136 intermediate revisions by 58 users not shown) | |||
Line 1:
{{Short description|Use of Operative System by type of extremely powerful computer}}
[[File:JaguarXT5.jpg|thumb|330px|The [[Jaguar (supercomputer)|Jaguar XT5]] supercomputer at [[Oak Ridge National Laboratory|Oak Ridge National Labs]]]]▼
A '''supercomputer operating system''' is an [[operating system]] intended for [[supercomputer]]s. Since the end of the 20th century, supercomputer operating systems have undergone major transformations, as fundamental changes have occurred in [[supercomputer architecture]].<ref name=Padua426 /> While early operating systems were custom tailored to each supercomputer to gain speed, the trend has been moving away from in-house operating systems and toward some form of [[Linux]],<ref name=MacKenzie /> with it running all the supercomputers on the [[TOP500]] list in November 2017. In 2021, top 10 computers run for instance [[Red Hat Enterprise Linux]] (RHEL), or some variant of it or other [[Linux distribution]] e.g. [[Ubuntu]].
While in a traditional multi-user computer system
Although most modern
▲Modern supercomputers may run different operating systems on different nodes, e.g. using a small and efficient [[Lightweight Kernel Operating System|lightweight kernel]] such as [[CNK operating system|CNK]] or [[Compute Node Linux|CNL]] on compute nodes, but a larger and more full-fledged system such as a [[Linux]]-derivative on server and I/O nodes.<ref name=EuroPar2004/><ref name=Alam>''An Evaluation of the Oak Ridge National Laboratory Cray XT3'' by Sadaf R. Alam etal ''International Journal of High Performance Computing Applications'' February 2008 vol. 22 no. 1 52-80</ref>
[[File:Operating systems used on top 500 supercomputers.svg|thumb|right|Operating systems used on top 500 supercomputers]]
▲Although most modern supercomputer use the [[Linux]] operating system, each manufacturer has made its own specific changes to the Linux-derivative they use, and no industry standard exists, partly due to the fact that the differences in hardware architectures require changes to optimize the operating system to each architecture.<ref name=Padua426 /><ref>{{cite web|url=http://www.top500.org/overtime/list/32/os |title=Top500 OS chart |publisher=Top500.org |date= |accessdate=2010-10-31}}</ref>
==Context and overview==
In the early days of supercomputing, the basic architectural concepts were evolving rapidly, and [[system software]] had to follow hardware innovations that usually took rapid turns.<ref name=Padua426 /> In the early systems, operating systems were custom tailored to each supercomputer to gain speed, yet in the rush to develop them, serious software quality challenges surfaced and in many cases the cost and complexity of system software development became as much
[[File:Pleiades supercomputer.jpg|thumb|240px|left|The supercomputer center at [[NASA Ames]]]]
In the 1980s the cost for software development at [[Cray]] came to equal what
By the early 1990s, major changes were
The separation of the operating system into separate components became necessary as supercomputers developed different types of nodes, e.g., compute nodes versus I/O nodes. Thus modern supercomputers usually run different operating systems on different nodes, e.g., using a small and efficient [[lightweight kernel operating system|lightweight kernel]] such as [[CNK operating system|CNK]] or [[Compute Node Linux|CNL]] on compute nodes, but a larger system such as a [[Linux]]-derivative on server and I/O nodes.<ref name=EuroPar2004/><ref name=Alam/>
While in a traditional multi-user computer system, [[job scheduling]] is in effect a [[task scheduling|scheduling]] problem for processing and peripheral resources, in a a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources.<ref name=Yariv /> The need to tune task scheduling and tune the operating system in different configurations of a supercomputer is essential. A typical parallel job scheduler has a master scheduler which instructs a number of slave schedulers to launch, monitor and control parallel jobs, and periodically receives reports from them about the status of job progress.<ref name=Yariv />▼
==Early systems==
[[File:Cray 1 IMG 9126.jpg|thumb|The first [[Cray-1
The [[CDC 6600]], generally considered the first supercomputer in the world, ran the [[Chippewa Operating System]], which was then deployed on various other [[CDC 6000 series]] computers.
The first [[Cray
Around the same time, the [[EOS (operating system)|EOS]] operating system was developed by [[ETA Systems]] for use in their [[ETA10]] supercomputers
Lloyd M. Thorndyke, ''The Demise of the ETA Systems'' in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 {{ISBN
By the middle
==Modern approaches==
▲[[File:
The IBM [[Blue Gene]]
▲While in
Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The [[PBS Pro|PBS Pro scheduler]] used on the [[Cray XT3]] and [[Cray XT4]] systems does not attempt to optimize locality on its three-dimensional [[torus interconnect]], but simply uses the first available processor.<ref name=Eitan/> On the other hand, IBM's scheduler on the Blue Gene supercomputers aims to exploit locality and minimize network contention by assigning tasks from the same application to one or more midplanes of an 8x8x8 node group.<ref name=Eitan>''Job Scheduling Strategies for Parallel Processing:'' by Eitan Frachtenberg and Uwe Schwiegelshohn 2010 {{ISBN|3-642-04632-0}} pages 138–144.</ref> The [[Slurm Workload Manager]] scheduler uses a best fit algorithm, and performs [[Hilbert curve scheduling]] to optimize locality of task assignments.<ref name=Eitan/> Several modern supercomputers such as the [[Tianhe-2]] use Slurm, which arbitrates contention for resources across the system. Slurm is [[Open-source software|open source]], Linux-based, very scalable, and can manage thousands of nodes in a computer cluster with a sustained throughput of over 100,000 jobs per hour.<ref>[http://slurm.schedmd.com/ SLURM at SchedMD]</ref><ref>Jette, M. and M. Grondona, ''SLURM: Simple Linux Utility for Resource Management'' in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003 [http://www.schedmd.com/slurmdocs/slurm_design.pdf]</ref>
==See also==
* [[Distributed operating system]]
* [[List of the top supercomputers in the United States]]
* [[Supercomputer architecture]]
* [[Usage share of operating systems#Supercomputers|Usage share of supercomputer operating systems]]
==References==
{{Reflist}}
{{Supercomputer operating systems}}
[[Category:Supercomputers]]▼
{{Parallel computing}}
{{Operating system}}
[[Category:Supercomputer operating systems| ]]
[[Category:Operating systems]]
▲[[Category:Supercomputers|*Supercomputer operating systems]]
|