Distributed operating system: Difference between revisions

Content deleted Content added
JLSjr (talk | contribs)
No edit summary
JLSjr (talk | contribs)
No edit summary
Line 101:
Generally, transparency and user-required knowledge form an inverse relation. As transparency is designed and implemented into various areas of a system, great care must be taken not to adversely effect other areas of transparency and other basic design concerns. Transparency, as a design concept, is one of the grand challenges in design of a distributed operating system; as it is a factor in the necessity for a complete upfront understanding.
 
*'''Location transparency''' - Location transparency comprises two distinct aspects, Naming and User mobility.
:*'''Naming transparency''' requires that nothing in the physical or logical references to an entity should expose any indication of the entities ___location.
:*'''User mobility''' requires consistent referencing of an entity regardless of its ___location within the system. These two related concepts, naming transparency and user mobility, work together to remove the need for a user's knowledge regarding specific entities' details within a system.
 
*'''Access transparency''' - Local and remote resources should remain indistinguishable through user interface system calls. The Distributed operating system maintains a user's perception of these entities in a clean, clear, and consistent manner. System entities or processes maintain consistent access/entry mechanism, regardless of being local or remote.
Line 128 ⟶ 130:
 
===Inter-process communication===
[[Inter-Process Communication]] (IPC) is the implementation of general communication, process interaction, and data flow[[dataflow]] between [[Thread (computer science)|threads]] and/or [[Process (computing)|processes]] both within a system node, and between all nodes in a distributed operating system. The distributed nature of a system's nodes and the multi-level considerations of intra-node and inter-node requirements provide the base-line for high-level IPC design considerations. However, IPC in a distributed operating system is a low-level implementation. IPC is the low-level critical complement to the high-level concept of transparency. Many of the requirements and restrictions imposed on a system as a result of transparency will be accomplished directly or indirectly through IPC. In this sense, IPC is the greatest underlying concept in the low-level design considerations of a distributed operating system.
 
===Process management===
[[Process management (computing)|Process management]] provides policies and mechanisms for effective and efficient sharing of a system's distributed processing resources between that system's distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports, as well as provisions to run, suspend, migrate, halt, or resume execution of processes, to mention a few. While these distributed operating system resources and the operations on them can be either local or remote with respect to each other, the distributed operating system must still maintain complete state of and synchronization over all processes in the system; and do so in a manner completely consistent from the user's unified system perspective.
 
As an example, [[Load balancing (computing)|load balancing]] is a common process management function. One consideration of load balancing is which process should be moved. The kernel may have several mechanisms, one of which might be priority-based choice. This mechanism in the kernel defines '''what can be done'''; in this case, choose a process based on some priority. The system management components would have policies implementing the decision making for this context. One of these policies would define what priority means, and how it is to be used to choose a process in this instance.
 
===Resource management===
[[Resource (computer science)|Systems resources]] such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. '''Load sharing''' and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many algorithms[[algorithm]]s exist to aid in these decisions; however, this calls for a second-level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.
 
===Reliability===
One of the basic tenants of distributed operating systems is a high-level of '''reliability'''. This quality attribute of a distributed operating system has become a staple expectation. Reliability is most often considered from the perspectives of [[availability]] and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. [[Fault (technology)|Faults]] are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults.

There are three general methods for dealing with faults: '''fault avoidance''', [[Fault-tolerant design|fault tolerance]], and '''fault detection and recovery'''. Fault avoidance areis considered to be the proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of '''transactions''', [[Replication (computer science)|replicated resources and processes]], and [[Replication (computer science)#Primary-backup and multi-primary replication|primary back-ups]] of complete servers. Fault tolerance is the ability of a system to continue some meanful level of operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality. In any event, Any actions taken should make every effort to preserving the '''single system image'''.
 
===Performance===
[[Computer performance|Performance]] is arguably the quintessential computing concern, and in the distributed operating system, it is no different. Many [[Benchmark (computing)|benchmark metrics]] exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed operating system, this consideration most often distills to a balance between [[Parallel computing|process parallelism]] and IPC. Managing the [[Granularity#In computing|task granularity]] of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to [[Process migration|migrate a process]] to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance.
 
===Synchronization===
Cooperating [[Concurrent computing|concurrent processes]] have an inherent need for [[Synchronization (computer science)|synchronization]]. Three basic situations that define the scope of this need;:
:*one or more processes must synchronize at a given point for one or more other processes to continue,
:*one or more processes must wait for an asynchronous condition in order to continue,
:*or a process must establish mutual exclusive access to a shared resource.

There isare a multitude of algorithms available for these scenarios, and theireach have many variations. Unfortunately, whenever synchronization is required the opportunity for process [[deadlock]] usually exists. The ancillary situation of deadlock is covered below.
 
===Flexibility===
[[Flexibility (engineering)|Flexibility]] in a distributed operating system is made possible through the modular characteristics of the microkernel. With the microkernel presenting aan absolute minimal -- but complete -- set of primitives and basic [[Cohesion (computer science)|functionally cohesive]] services, The higher-level management components can be composed in a similar functionally cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional instances of components above the kernel.
 
==Historical perspectives==