Lightweight kernel operating system: Difference between revisions

Content deleted Content added
VanishedUserABC (talk | contribs)
No edit summary
m Guy Harris moved page Lightweight Kernel Operating System to Lightweight kernel operating system: Fix capitalization - this is a phrase that describes a type of operating system, not a name for a particular operating system.
 
(29 intermediate revisions by 20 users not shown)
Line 1:
A '''lightweight kernel''' (LWK) operating system is one used in a large computer with many [[central processing unit|processor]] cores, termed a [[Parallel computing|parallel computer]].
A [[massively parallel]], [[high-performance computing]] (HPC) system is particularly sensitive to [[operating system]] overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common [[parallel programming model]] is referred to as the [[bulk synchronous parallel]] model which often employs [[MPI]] for communication. The synchronization events are made at specific points in the application code. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.
 
A [[massively parallel]], [[high-performance computing]] (HPC) system is particularly sensitive to [[operating system]] overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common [[parallel programming model]] is referred to as the [[bulk synchronous parallel]] model which often employs [[MPIMessage Passing Interface]] (MPI) for communication. The synchronization events are made at specific points in the [[application code]]. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.
Custom '''Lightweight Kernel''' (LWK) operating systems, currently used on some of the fastest computers in the world, help alleviate this problem. The [[IBM]] [[Blue Gene]] line of [[supercomputers]] runs various versions of [[CNK operating system]].<ref name=bgl-cnk>{{cite journal
 
| title = Designing a Highly-Scalable Operating System: The Blue Gene/L Story
== Examples ==
| publisher = Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06)
Custom '''Lightweightlightweight Kernel''' (LWK)kernel operating systems, currently used on some of the fastest computers in the world, help alleviate this problem. The [[IBM]] [[Blue Gene]] line of [[supercomputerssupercomputer]]s runs various versions of [[CNK operating system]].<ref name=bgl-cnk>{{cite journal
| author = Moreira, Jose, et al
| title = Designing a Highly-Scalable Operating System: The Blue Gene/L Story
| date = 2006-11
| publisher = Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06)
}}</ref>
| author = Moreira, Jose,|date=November et2006 al
The [[Cray XT4]] and [[Cray XT5]] supercomputers run [[Compute Node Linux]].<ref name=cnl-dwb>{{cite journal
|display-authors=etal}}</ref>
| title = Compute Node Linux: Overview, progress to date, and roadmap
The [[Cray XT4]] and [[Cray XT5]] supercomputers run [[Compute Node Linux]].<ref name=cnl-dwb>{{cite journal
| publisher = Proceedings of the 2007 Cray User Group Annual Technical Conference
| title = Compute Node Linux: Overview, progress to date, and roadmap
| author = Wallace, D.
| publisher = Proceedings of the 2007 Cray User Group Annual Technical Conference
| date = 2007-05
| author = Wallace, D.
}}</ref> [[Sandia National Laboratories]] has an almost two-decade commitment to Lightweight Kernels on its high-end HPC systems.<ref name=lwk-rr>{{cite journal
| date =May 2007-05
| title = Designing and Implementing Lightweight Kernels for Capability Computing
}}</ref> while the earlier XT3 ran the lightweight kernel [[Catamount (operating system)|Catamount]] which was based on [[SUNMOS]].
| publisher = Concurrency and Computation: Practice and Experience
}}</ref> [[Sandia National Laboratories]] has an almost two-decade commitment to Lightweightlightweight Kernelskernels on its high-end HPC systems.<ref name=lwk-rr>{{cite journal
| author = Riesen, Rolf, et al
| title = Designing and Implementing Lightweight Kernels for Capability Computing
| date = 2009-04
| publisher = Concurrency and Computation: Practice and Experience
}}</ref>
| author = Riesen, Rolf,|date=April et2009 al
Sandia and University of New Mexico researchers began work on [[SUNMOS]] for the [[Intel Paragon]] in the early 1990s. This operating system evolved into the Puma, Cougar, and Catamount operating systems deployed on [[ASCI Red]] and [[Red Storm]]. Sandia continues its work in LWKs with a new R&D effort, called kitten .<ref name=pedretti>{{cite web
|display-authors=etal}}</ref>
| url = https://software.sandia.gov/trac/kitten
Sandia and University of New Mexico researchers began work on [[SUNMOS]] for the [[Intel Paragon]] in the early 1990s. This operating system evolved into the Puma, Cougar, and- Catamountwhich operatingachieved systemsthe deployedfirst teraflop on [[ASCI Red]] - and Catamount on [[Red Storm (computing)|Red Storm]]. Sandia continues its work in LWKs with a new R&D effort, called kitten .<ref name=pedretti>{{cite web
| title = Kitten Lightweight Kernel
| url = https://software.sandia.gov/trac/kitten
| title = Kitten Lightweight Kernel
}}</ref>
 
== Characteristics ==
The design goals of these operating systems are:
Although it is surprisingly difficult to exactly define what a lightweight kernel is,<ref>
{{cite book |last1=Riesen |first1=Rolf |title=Proceedings of the 5th International Workshop on Runtime and Operating Systems for Supercomputers |chapter=What is a Lightweight Kernel? |display-authors=etal |pages=1–8 |
date=June 2015 |doi=10.1145/2768405.2768414 |chapter-url=https://dl.acm.org/citation.cfm?id=2768414 |accessdate=19 October 2019|isbn=9781450336062 |s2cid=11698915 }}</ref> there are some common design goals:
* Targeted at massively parallel environments composed of thousands of processors with distributed memory and a tightly coupled network.
* Provide necessary support for scalable, performance-oriented scientific applications.
* Offer a suitable development environment for parallel applications and libraries.
* Emphasize efficiency over functionality.
* Maximize the amount of resources (e.g., CPU, memory, and network bandwidth) allocated to the application.
* Seek to minimize time to completion for the application.<ref name=cat-smk>{{cite journal
| title = Software Architecture of the Light Weight Kernel, Catamount
| publisher = Proceedings of the 2005 Cray User Group Annual Technical Conference
| author author1= Kelly, S. and |author2=Brightwell, R.
| date =May 2005-05
}}</ref>
 
== Implementation ==
LWK implementations vary, but all strive to provide applications with predictable and maximum access to the [[CPUcentral processing unit]] (CPU) and other system resources. To achieve this, simplified algorithms for scheduling and memory management are usually included. System services (e.g., daemons), are limited to the absolute minimum. Available services, such as job launch are constructed in a hierarchical fashion to ensure scalability to thousands of nodes. Networking protocols for communication between nodes in the system are also carefully selected and implemented to ensure scalability. One such example is the [[Portals network programming apiapplication programming interface]]. (API).
 
Lightweight Kernelkernel operating systems assume access to a small set of nodes that are running full-service operating systems to offload some of the necessary services: login access, compilationcompiling environments, batch job submission, and file I/O.
 
By restricting services to only those that are absolutely necessary and by streamlining those that are provided, the overhead (sometimes called noise) of the lightweight operating system is minimized. This allows a significant ''and'' predictable amount of the processor cycles to be given to the parallel application. Since the application can make consistent forward progress on each processor, they will reach their synchronization points faster, ideally at the same time. Lost wait time is reduced.
 
== Future ==
The last supercomputers running lightweight kernels are the remaining IBM [[Blue Gene|Bluegene]] systems running [[CNK operating system|CNK]]. A new direction for lightweight kernels is to combine them with a
full-featured OS, such as Linux, on a many-core node. These multi-kernel operating systems run a lightweight kernel on some of the CPU cores of a node, while other cores provide services that are
omitted in lightweight kernels. By combining the two, users get the Linux features they need but also the deterministic behavior and scalability of lightweight kernels.
 
== References ==
{{Reflist}}
 
{{Supercomputer operating systems}}
[[Category:Operating systems]][[Category:Massively_parallel_computers]]
 
[[Category:Supercomputer operating systems]]
[[Category:Massively parallel computers]]