Lightweight kernel operating system: Difference between revisions

Content deleted Content added
Smk-slab (talk | contribs)
mNo edit summary
Smk-slab (talk | contribs)
Expanded description of LWK functionality and implementation details
Line 1:
A [[massively parallel]], [[high-performance computing]] (HPC) system is particularly sensitive to [[operating system]] overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common [[parallel programming model]] is referred to as the [[bulk synchronous parallel]] model which often employs [[MPI]] for communication. The synchronization events are made at specific points in the application code. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable and frequent operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.
 
Custom '''Lightweight Kernel''' (LWK) operating systems, currently used inon some of the fastest computers in the world, help alleviate this problem. The [[IBM]] [[Blue Gene]] line of [[supercomputers]] runruns various versions of CNK—Compute Node Kernel .<ref name=bgl-cnk>{{cite paper
| title = Designing a Highly-Scalable Operating System: The Blue Gene/L Story
| publisher = Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06)
| author = Moreira, Jose, et al
| date = 2006-11
}}</ref> .
}}</ref> . The [[Cray XT4]] and [[Cray XT5]] supercomputers run [[Compute Node Linux]] <ref name=cnl-dwb>{{cite paper
| title = Compute Node Linux: Overview, progress to date, and roadmap
| publisher = Proceedings of the 2007 Cray User Group Annual Technical Conference
Line 16 ⟶ 17:
| author = Riesen, Rolf, et al
| date = 2009-04
}}</ref> .
}}</ref> . Sandia and University of New Mexico researchers began work on the [[SUNMOS]] for the [[Intel Paragon]] in the early 1990s. This operating system evolved into the Puma, Cougar, and Catamount operating systems deployed on [[ASCI Red]] and [[Red Storm]]. Sandia continues its work in LWKs with a new R&D effort, called kitten .<ref name=pedretti>{{cite web
| url = https://software.sandia.gov/trac/kitten
| title = Kitten Lightweight Kernel
Line 22 ⟶ 24:
 
The design goals of these operating systems are:
* Targeted at massively parallel environments comprised of thousands of processors with distributed memory and a tightly coupled network.
* Provide necessary support for scalable, performance-oriented scientific applications.
* Offer a suitable development environment for parallel applications and libraries.
* Emphasize efficiency over functionality.
* Maximize the amount of resources (e.g. CPU, memory, and network bandwidth) allocated to the application.
* Seek to minimize time to completion for the application. <ref name=cat-smk>{{cite paper
| title = Software Architecture of the Light Weight Kernel, Catamount
| publisher = Proceedings of the 2005 Cray User Group Annual Technical Conference
| author = Kelly, S. and Brightwell, R.
| date = 2005-05
}}</ref>
 
LWK implementations vary, but all strive to provide applications with predictable and maximum access to the [[CPU]] and other system resources. To achieve this, simplified algorithms for scheduling and memory management are usually included. System services (e.g. daemons), are limited to the absolute minimum. Available services, such as job launch are constructed in a hierarchical fashion to ensure scalability to thousands of nodes. Networking protocols for communication between nodes in the system are also carefully selected and implemented to ensure scalability. One such example is the [[Portals network programming api]].
 
Lightweight Kernel operating systems assume access to a small set of nodes that are running full-service operating systems to offload some of the necessary services: login access, compilation environments, batch job submission, and file I/O.
 
== References ==
<!--- See [[Wikipedia:Footnotes]] on how to create references using <ref></ref> tags which will then appear here automatically -->
{{Reflist}}
 
Line 38 ⟶ 46:
<!--- Categories --->
[[Category:Computing]]
[[Category:Operating system]]