Lightweight kernel operating system: Difference between revisions

Content deleted Content added
BG19bot (talk | contribs)
m WP:CHECKWIKI error fix. Category problem. Syntax fixes. Do general fixes if a problem exists. - using AWB (9421)
Line 1:
A [[massively parallel]], [[high-performance computing]] (HPC) system is particularly sensitive to [[operating system]] overhead. Traditional, multi-purpose, operating systems are designed to support a wide range of usage models and requirements. To support the range of needs, a large number of system processes are provided and are often inter-dependent on each other. The computing overhead of these processes leads to an unpredictable amount of processor time available to a parallel application. A very common [[parallel programming model]] is referred to as the [[bulk synchronous parallel]] model which often employs [[Message Passing Interface]] (MPI) for communication. The synchronization events are made at specific points in the application code. If one processor takes longer to reach that point than all the other processors, everyone must wait. The overall finish time is increased. Unpredictable operating system overhead is one significant reason a processor might take longer to reach the synchronization point than the others.
 
Custom '''Lightweight Kernel''' (LWK) operating systems, currently used on some of the fastest computers in the world, help alleviate this problem. The [[IBM]] [[Blue Gene]] line of [[supercomputers]] runs various versions of [[CNK operating system]].<ref name=bgl-cnk>{{cite journal
Line 36:
}}</ref>
 
LWK implementations vary, but all strive to provide applications with predictable and maximum access to the [[CPU]] and other system resources. To achieve this, simplified algorithms for scheduling and memory management are usually included. System services (e.g. daemons), are limited to the absolute minimum. Available services, such as job launch are constructed in a hierarchical fashion to ensure scalability to thousands of nodes. Networking protocols for communication between nodes in the system are also carefully selected and implemented to ensure scalability. One such example is the [[Portals network programming api]].
 
Lightweight Kernel operating systems assume access to a small set of nodes that are running full-service operating systems to offload some of the necessary services: login access, compilation environments, batch job submission, and file I/O.
 
By restricting services to only those that are absolutely necessary and by streamlining those that are provided, the overhead (sometimes called noise) of the lightweight operating system is minimized. This allows a significant ''and'' predictable amount of the processor cycles to be given to the parallel application. Since the application can make consistent forward progress on each processor, they will reach their synchronization points at the same time. Lost wait time is reduced.
Line 45:
{{Reflist}}
 
[[Category:Operating systems]][[Category:Massively_parallel_computers]]
[[Category:Massively parallel computers]]