Supercomputer operating system: Difference between revisions

Content deleted Content added
m top: Fix ISBN
m Replace magic links with templates per local RfC and MediaWiki RfC
Line 3:
Given that modern [[massively parallel]] supercomputers typically separate computations from other services by using multiple types of [[Locale (computer hardware)|nodes]], they usually run different operating systems on different nodes, e.g., using a small and efficient [[Lightweight Kernel Operating System|lightweight kernel]] such as [[CNK operating system|Compute Node Kernel]] (CNK) or [[Compute Node Linux]] (CNL) on compute nodes, but a larger system such as a [[Linux]]-derivative on server and [[input/output]] (I/O) nodes.<ref name=EuroPar2004/><ref name=Alam>''An Evaluation of the Oak Ridge National Laboratory Cray XT3'' by Sadaf R. Alam, et al., International Journal of High Performance Computing Applications, February 2008 vol. 22 no. 1 52-80</ref>
 
While in a traditional multi-user computer system [[job scheduling]] is in effect a [[task scheduling|tasking]] problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources, as well as gracefully dealing with inevitable hardware failures when tens of thousands of processors are present.<ref name=Yariv >Open Job Management Architecture for the Blue Gene/L Supercomputer by Yariv Aridor et al in ''Job scheduling strategies for parallel processing'' by Dror G. Feitelson 2005 {{ISBN |978-3-540-31024-2}} pages 95-101</ref>
 
Although most modern supercomputers use the [[Linux]] operating system,<ref>{{cite web
Line 12:
 
[[File:Pleiades supercomputer.jpg|thumb|240px|left|The supercomputer center at [[NASA Ames]]]]
In the 1980s the cost for software development at [[Cray]] came to equal what they spent on hardware and that trend was partly responsible for a move away from the in-house operating systems to the adaptation of generic software.<ref name=MacKenzie >''Knowing machines: essays on technical change'' by Donald MacKenzie 1998 {{ISBN |0-262-63188-1}} page 149-151</ref> The first wave in operating system changes came in the mid 1980s, as vendor specific operating systems were abandoned in favor of [[Unix]]. Despite early skepticism, this transition proved successful.<ref name=Padua426 /><ref name=MacKenzie />
 
By the early 1990s, major changes were occurring in supercomputing system software.<ref name=Padua426 >''Encyclopedia of Parallel Computing by David Padua 2011 {{ISBN |0-387-09765-1}} pages 426-429</ref> By this time, the growing use of Unix had begun to change the way system software was viewed. The use of a high level language ([[C (programming language)|C]]) to implement the operating system, and the reliance on standardized interfaces was in contrast to the [[assembly language]] oriented approaches of the past.<ref name=Padua426 /> As hardware vendors adapted Unix to their systems, new and useful features were added to Unix, e.g., fast file systems and tunable [[process scheduler]]s.<ref name=Padua426 /> However, all the companies that adapted Unix made unique changes to it, rather than collaborating on an industry standard to create "Unix for supercomputers". This was partly because differences in their architectures required these changes to optimize Unix to each architecture.<ref name=Padua426 />
 
Thus as general purpose operating systems became stable, supercomputers began to borrow and adapt the critical system code from them and relied on the rich set of secondary functions that came with them, not having to reinvent the wheel.<ref name=Padua426 /> However, at the same time the size of the code for general purpose operating systems was growing rapidly. By the time Unix-based code had reached 500,000 lines long, its maintenance and use was a challenge.<ref name=Padua426 /> This resulted in the move to use [[microkernel]]s which used a minimal set of the operating system functions. Systems such as [[Mach (kernel)|Mach]] at [[Carnegie Mellon University]] and [[ChorusOS]] at [[INRIA]] were examples of early microkernels.<ref name=Padua426 />
Line 22:
==Early systems==
[[File:Cray 1 IMG 9126.jpg|thumb|The first [[Cray-1]] (sample shown with internals) was delivered to the customer with no operating system.<ref>''Targeting the computer: government support and international competition'' by Kenneth Flamm 1987 ISBN 0-8157-2851-4 page 82 [https://books.google.it/books?id=6sf0g4q5Ue8C&pg=PA82&dq=%22Cray-1%22+delivered+%22without+software%22+%22operating+system%22&hl=en&sa=X&ei=RlpKT6nFDo3gtQaQrcGYBQ&sqi=2&redir_esc=y#v=onepage&q=%22Cray-1%22%20delivered%20%22without%20software%22%20%22operating%20system%22&f=false]</ref>]]
The [[CDC 6600]], generally considered the first supercomputer in the world, ran the [[Chippewa Operating System]], which was then deployed on various other [[CDC 6000 series]] computers.<ref name=Vardalas >''The computer revolution in Canada'' by John N. Vardalas 2001 {{ISBN |0-262-22064-4}} page 258</ref> The Chippewa was a rather simple [[job control (computing)|job control]] oriented system derived from the earlier [[CDC 3000]], but it influenced the later [[CDC KRONOS|KRONOS]] and [[CDC SCOPE (software)|SCOPE]] systems.<ref name=Vardalas /><ref>''Design of a computer: the Control Data 6600'' by James E. Thornton, Scott, Foresman Press 1970 page 163</ref>
 
The first [[Cray 1]] was delivered to the Los Alamos Lab with no operating system, or any other software.<ref name=Flamm >''Targeting the computer: government support and international competition'' by Kenneth Flamm 1987 {{ISBN |0-8157-2851-4}} pages 81-83</ref> Los Alamos developed the application software for it, and the operating system.<ref name=Flamm /> The main timesharing system for the Cray 1, the [[Cray Time Sharing System]] (CTSS), was then developed at the Livermore Labs as a direct descendant of the [[Livermore Time Sharing System]] (LTSS) for the CDC 6600 operating system from twenty years earlier.<ref name=Flamm />
 
In developing supercomputers, rising software costs soon became dominant, as evidenced by the 1980s cost for software development at Cray growing to equal their cost for hardware.<ref name="MacKenzie"/> That trend was partly responsible for a move away from the in-house [[Cray Operating System]] to [[UNICOS]] system based on [[Unix]].<ref name=MacKenzie /> In 1985, the [[Cray 2]] was the first system to ship with the UNICOS operating system.<ref name=Power >Lester T. Davis, ''The balance of power, a brief history of Cray Research hardware architectures'' in "High performance computing: technology, methods, and applications" by J. J. Dongarra 1995 {{ISBN |0-444-82163-5}} page 126 [https://books.google.com/books?id=iqSWDaSFNvkC&pg=PA126&dq=cray+2++%22operating+system%22&hl=en&ei=yN8-TqWBHYP1sgb5mKgL&sa=X&oi=book_result&ct=result&resnum=1&ved=0CC8Q6AEwAA#v=onepage&q=cray%202%20%20%22operating%20system%22&f=false]</ref>
 
Around the same time, the [[EOS (operating system)|EOS]] operating system was developed by [[ETA Systems]] for use in their [[ETA10]] supercomputers.<ref name=Thorndyke >
Lloyd M. Thorndyke, ''The Demise of the ETA Systems'' in "Frontiers of Supercomputing II by Karyn R. Ames, Alan Brenner 1994 {{ISBN |0-520-08401-2}} pages 489-497</ref> Written in [[Cybil (computer language)|Cybil]], a Pascal-like language from [[Control Data Corporation]], EOS highlighted the stability problems in developing stable operating systems for supercomputers and eventually a Unix-like system was offered on the same machine.<ref name=Thorndyke /><ref>''Past, present, parallel: a survey of available parallel computer systems'' by Arthur Trew 1991 {{ISBN |3-540-19664-1}} page 326</ref> The lessons learned from developing ETA system software included the high level of risk associated with developing a new supercomputer operating system, and the advantages of using Unix with its large extant base of system software libraries.<ref name=Thorndyke />
 
By the middle 1990s, despite the extant investment in older operating systems, the trend was toward the use of Unix-based systems, which also facilitated the use of interactive [[graphical user interface]]s (GUIs) for [[scientific computing]] across multiple platforms.<ref>''Frontiers of Supercomputing II'' by Karyn R. Ames, Alan Brenner 1994 {{ISBN |0-520-08401-2}} page 356</ref> The move toward a ''commodity OS'' had opponents, who cited the fast pace and focus of Linux development as a major obstacle against adoption.<ref>{{cite web |url=http://www.sandia.gov/~rbbrigh/slides/conferences/commodity-os-ipdps03-slides.pdf |title=On the Appropriateness of Commodity Operating Systems for Large-Scale, Balanced Computing Systems |accessdate=January 29, 2013 |author=Brightwell,Ron Riesen,Rolf Maccabe,Arthur}}</ref> As one author wrote "Linux will likely catch up, but we have large-scale systems now". Nevertheless, that trend continued to gain momentum and by 2005, virtually all supercomputers used some [[Unix-like]] OS.<ref name=National136 >''Getting up to speed: the future of supercomputing'' by Susan L. Graham, Marc Snir, Cynthia A. Patterson, National Research Council 2005 {{ISBN |0-309-09502-6}} page 136</ref> These variants of Unix included [[IBM AIX]], the open source [[Linux]] system, and other adaptations such as [[UNICOS]] from Cray.<ref name=National136 /> By the end of the 20th century, Linux was estimated to command the highest share of the supercomputing pie.<ref name=Padua426 /><ref>[http://www.forbes.com/2005/03/15/cz_dl_0315linux.html Forbes magazine, 03.15.05: ''Linux Rules Supercomputers'']</ref>
 
==Modern approaches==
[[File:IBM Blue Gene P supercomputer.jpg|240px|thumb|The [[Blue Gene]]/P supercomputer at [[Argonne National Laboratory|Argonne National Lab]]]]
The IBM [[Blue Gene]] supercomputer uses the [[CNK operating system]] on the compute nodes, but uses a modified [[Linux]]-based kernel called I/O Node Kernel ([[INK (operating system)|INK]]) on the I/O nodes.<ref name=EuroPar2004>''Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference'' 2004, by Marco Danelutto, Marco Vanneschi and Domenico Laforenza {{ISBN |3-540-22924-8}} pages 835</ref><ref name=EuroPar2006 >''Euro-Par 2006 Parallel Processing: 12th International Euro-Par Conference'', 2006, by Wolfgang E. Nagel, Wolfgang V. Walter and Wolfgang Lehner {{ISBN |3-540-37783-2}} page</ref> CNK is a [[Lightweight Kernel Operating System|lightweight kernel]] that runs on each node and supports a single application running for a single user on that node. For the sake of efficient operation, the design of CNK was kept simple and minimal, with physical memory being statically mapped and the CNK neither needing nor providing scheduling or context switching.<ref name=EuroPar2004 /> CNK does not even implement [[Input/output|file I/O]] on the compute node, but delegates that to dedicated I/O nodes.<ref name=EuroPar2006 /> However, given that on the Blue Gene multiple compute nodes share a single I/O node, the I/O node operating system does require multi-tasking, hence the selection of the Linux-based operating system.<ref name=EuroPar2004/><ref name=EuroPar2006/>
 
While in traditional multi-user computer systems and early supercomputers, [[job scheduling]] was in effect a [[task scheduling]] problem for processing and peripheral resources, in a massively parallel system, the job management system needs to manage the allocation of both computational and communication resources.<ref name=Yariv /> It is essential to tune task scheduling, and the operating system, in different configurations of a supercomputer. A typical parallel job scheduler has a [[Master/slave (technology)|master scheduler]] which instructs some number of slave schedulers to launch, monitor, and control [[Parallel computing|parallel jobs]], and periodically receives reports from them about the status of job progress.<ref name=Yariv />
 
Some, but not all supercomputer schedulers attempt to maintain locality of job execution. The [[PBS Pro|PBS Pro scheduler]] used on the [[Cray XT3]] and [[Cray XT4]] systems does not attempt to optimize locality on its three-dimensional [[torus interconnect]], but simply uses the first available processor.<ref name=Eitan/> On the other hand, IBM's scheduler on the Blue Gene supercomputers aims to exploit locality and minimize network contention by assigning tasks from the same application to one or more midplanes of an 8x8x8 node group.<ref name=Eitan>''Job Scheduling Strategies for Parallel Processing:'' by Eitan Frachtenberg and Uwe Schwiegelshohn 2010 {{ISBN |3-642-04632-0}} pages 138-144</ref> The [[Slurm Workload Manager]] scheduler uses a best fit algorithm, and performs [[Hilbert curve scheduling]] to optimize locality of task assignments.<ref name=Eitan/> Several modern supercomputers such as the [[Tianhe-2]] use Slurm, which arbitrates contention for resources across the system. Slurm is [[open source]], Linux-based, very scalable, and can manage thousands of nodes in a computer cluster with a sustained throughput of over 100,000 jobs per hour.<ref>[http://slurm.schedmd.com/ SLURM at SchedMD]</ref><ref>Jette, M. and M. Grondona, ''SLURM: Simple Linux Utility for Resource Management'' in the Proceedings of ClusterWorld Conference, San Jose, California, June 2003 [http://www.schedmd.com/slurmdocs/slurm_design.pdf]</ref>
 
==See also==