Content deleted Content added
No edit summary |
No edit summary |
||
Line 30:
===Distributed operating system essentials===
A distributed operating system, illustrated in a similar fashion, would be a container suggesting [[Microkernel#Essential components and minimality|minimal operating system functionality and scope]]. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed operating system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “[[separation of mechanism and policy]].” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed operating system.
== Overview ==
Line 47:
The architecture and design of a distributed operating system is specifically aligned with realizing both individual node and global system goals. Any architecture or design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide a highly efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts.<ref name="DCD"/>
The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity, to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity.
===The price of complexity===
Line 77:
| caption3 = Generalized organization of nodes in a distributed model.<br/><br/>
}}
The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is
This distinction is the source of the subtlety and complexity. While decentralized systems and distributed operating systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of [[Inter-process communication|communication]] between the nodes of the system.
=== Three basic distributions ===
Line 85:
====Organization====
Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more [[Federation|federated structure]], multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed operating system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements.
====Connection====
Line 91:
====Control====
Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed operating system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed operating system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed operating system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article.
===Conclusions===
Line 98:
==Major Design Considerations==
===Transparency===
Transparency, simply put, is the quality of a distributed operating system to be seen and understood as a '''single-system image'''. Transparency is the greatest overriding consideration in the high-level conceptual design of a distributed operating system. While a simple concept, the consideration of transparency directly effects decision making in every aspect of design of a distributed operating system. Depending on the degree to which transparency is implemented into a system, certain requirements and/or restrictions may be imposed upon the many design considerations, and the relationships between them.
===Inter-process communication===
Inter-Process Communication (IPC) is the implementation of general communication, process interaction, and data flow between threads and/or processes both within a system node, and between all nodes in a distributed operating system. The distributed nature of a system's nodes and the multi-level considerations of intra-node and inter-node requirements provide the base-line for high-level IPC design considerations. However, IPC in a distributed operating system is a low-level implementation. IPC is the low-level critical complement to the high-level concept of transparency. Many of the requirements and restrictions imposed on a system as a result of transparency will be accomplished directly or indirectly through IPC. In this sense, IPC is the greatest underlying concept in the low-level design considerations of a distributed operating system.
===Process management===
Process management provides policies and mechanisms for effective and efficient sharing of a system's distributed processing resources between that system's distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports, as well as provisions to run, suspend, migrate, halt, or resume execution of processes, to mention a few. While these distributed operating system resources and the operations on them can be either local or remote with respect to each other, the distributed operating system must still maintain complete state of and synchronization over all processes in the system; and do so in a manner completely consistent from the user's unified system perspective.
As an example, load balancing is a common process management function. One consideration of load balancing is which process should be moved. The kernel may have several mechanisms, one of which might be priority-based choice. This mechanism in the kernel defines '''what can be done'''; in this case, choose a process based on some priority. The system management components would have policies implementing the decision making for this context. One of these policies would define what priority means, and how it is to be used to choose a process in this instance.
Line 112:
===Reliability===
One of the basic tenants of distributed operating systems is a high-level of reliability. This quality attribute of a distributed operating system has become a staple expectation. Reliability is most often considered from the perspectives of availability and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. Faults are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults. There are three general methods for dealing with faults: fault avoidance, fault tolerance, and fault detection and recovery. Fault avoidance are proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of transactions, replicated resources and processes, and primary back-ups of complete servers. Fault tolerance is the ability of a system to continue some meanful level of operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality. In any event, Any actions taken should make every effort to preserving the single system image.
===Performance===
Performance is arguably the quintessential computing concern, and in the distributed operating system, it is no different. Many benchmark metrics exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed operating system, this consideration most often distills to a balance between process parallelism and IPC. Managing the task granularity of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to migrate a process to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance.
===Synchronization===
Line 121:
===Flexibility===
Flexibility in a distributed operating system is made possible through the modular characteristics of the microkernel. With the microkernel presenting a minimal -- but complete -- set of primitives and basic functionally cohesive services, The higher-level management components can be composed in a similar functionally cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional of components above the kernel.
==Transparency responsibilities==
|