This article or section is in a state of significant expansion or restructuring, and is not yet ready for use. You are welcome to assist in its construction by editing it as well. If this article or section has not been edited in several days, please remove this template. If you are the editor who added this template and you are actively editing, please be sure to replace this template with {{in use}} during the active editing session. Click on the link for template parameters to use.
This article was last edited by JLSjr (talk | contribs) 15 years ago. (Update timer) |
There are several improvements in process (daily)
|
Revision/Update History |
![]() |
Near final draft of Lead/Introduction JLSjr (talk) 09:56, 14 April 2010 (UTC) |
![]() |
Again, restructured Lead/Intro sections prepatory to implementing the "body" JLSjr (talk) 10:34, 8 April 2010 (UTC) |
![]() |
Rebuilt lead, in preparation of "Architectural features" overhaul JLSjr (talk) 10:37, 7 April 2010 (UTC) |
![]() |
Updated much previously in lead, into "Description" section JLSjr (talk) 06:33, 30 March 2010 (UTC) |
![]() |
Updated lead section JLSjr (talk) 08:45, 28 March 2010 (UTC) |
![]() |
Added draft of Memory access abstraction, third section of "History"... JLSjr (talk) 07:30, 25 March 2010 (UTC) |
![]() |
Added drafts of Transparency and Modularity, under "Architectural features"... JLSjr (talk) 06:09, 24 March 2010 (UTC) |
![]() |
Added draft of the second section of "History"... JLSjr (talk) 05:44, 23 March 2010 (UTC) |
![]() |
Added draft of the first section of "History"... JLSjr (talk) 05:32, 18 March 2010 (UTC) |
![]() |
Added draft of "Overview" section... JLSjr (talk) 06:45, 17 March 2010 (UTC) |
![]() |
Added draft of "Lead" section... JLSjr (talk) 09:02, 16 March 2010 (UTC) |
![]() |
Added "Introduction" outline; framework of text to come... JLSjr (talk) 04:56, 15 March 2010 (UTC) |
![]() |
Added initial entry... JLSjr (talk) 12:48, 13 March 2010 (UTC) |
A Distributed operating system is the logical cumulative aggregation of operating system software within a Distributed System. The distributed operating system – considered collectively – is the foundation for coordinated operation of the distributed system’s independent and autonomous computational nodes.[1] Individual system nodes each contain a discrete subset of the global system’s operating system software. A given node’s system software set reveals a clean division – both physically and logically – between two distinct providers of services.[2]
The first is a minimal, low-level, node-servicing kernel, situated directly above the bare-metal of a node’s hardware. The kernel provides the foundation for all node-level activities. The second is a higher-level collection of system-servicing management components and services, the System Management Servers. This collection of globally-connected management components exists immediately above the microkernel, and below any user applications or APIs that might reside at higher levels.[3] These two entities, the kernel and the management components collection, work together in supporting the distributed operating system’s goal of seamlessly integrating all network-connected resources and functionality into an efficient, available, and unified system.[4]
Overview
The kernel
The Kernel is a minimal, but complete set of node-level utilities necessary for access to a node’s underlying hardware and resources. These mechanisms provide the complete set of “building-blocks” essential for node operation; mainly low-level allocation, management, and disposition of a node’s resources, processes, communication, and I/O management support functions.[5] These functions are made possible by exposing a concise, yet comprehensive array of primitive mechanisms and services. The kernel is arguably the primary consideration in a distributed operating system; however, within the kernel, the subject of foremost importance is that of a well-structured and highly-efficient communications sub-system.[3]
In a distributed operating system, the kernel is often defined by a relative to absolute minimal architecture. A Kernel of this design is referred to as a Microkernel.[6] [7] The microkernel usually contains only the mechanisms and services which, if otherwise removed, would render a node or the global system functionally inoperable. The minimal nature of the microkernel strongly enhances a distributed operating system’s modular potential.[8] It is generally the case that the kernel is implemented directly on the bare metal of a node’s hardware; it is also common for a kernel to be replicated over all the nodes in a system.[9] The combination of a kernel’s minimal design and ubiquitous coverage greatly aids in global system extensibility, and the ability to dynamically introduce new nodes or services.[10]
System management components
A node’s system management components are a collection of software server processes that basically define the policies of a system node. These components are the composite of a node’s system software not directly required within the kernel. These software services support all of the needs of the node; namely communication, process and resource management, reliability, performance, security, scalability, and hetrogeneity to mention just a few. In this capacity the system management components compare directly to the centralized operating system of a single-entity system.[3]
However, these system management components have the added challenges with respect to supporting a node's responsibilities to the global system. In addition, the system management components accept the defensive responsibilities inherent to a distributed collection of networked nodes. Quite often, any effort to realize success in a particular area illuminates conflict with similar efforts in other areas. Therefore, a consistent approach of balanced perspective and understanding of the overall system and it goals can help mitigate complexity and quickly identify points of the diminishing returns. It is for this purpose that the separation of policy and mechanism is so critical.[10]
Working together as an operating system
The architecture and design of a distributed operating system is specifically aligned with realizing individual mode and global system goals, in a manner consistent with separating policy and mechanism. Simply said, a distributed operating system attempts to provide a highly efficient and reliable distributed computing framework with a minimum user awareness of the underlying command and control efforts.[8] The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed system is the functional opportunity of the distributed operating system. However, this opportunity comes at a very high cost in complexity.
The price of complexity
In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed system – including its operating system – must be calculated in terms of overcoming vast amounts of complexity on many levels, and in many areas. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.[11] These design and development considerations are critical and unforgiving. For instance, an deep understanding of a distributed operating system’s overall detail is required from the start.[1] As an aid in this effort, most rely strongly on the immense amount of documented experience and research in distributed computing which exists, and continues even today.
Perspectives: past, present, and future
Many notable experts look to the early 1970s for the earliest distributed systems, complete by definition and capable of being considered and implemented wholly. Research and experimentation efforts began in earnest in the mid to late-1970s and continued into the early 1990s, with a few implementations achieving modest commercial success. The subject of distributed operating systems however, has a much richer historical perspective when considering design issues severally with respect to some of the individual primordial strides towards distributed computing. There are several instances of fundamental and pioneering implementations of primitive distributed system and component concepts dating back to the early 1950s. Looking to the modern distributed system and its future, the accelerating proliferation of multiprocessor systems and multi-core processors has led to a re-emergence of the distributed system concept. The inherent challenges in many-core and multiprocessor science has led to an enormous increase in distributed system related research. Many of these research efforts investigate and describe interesting and plausible paradigms for the future of distributed computing.
Description
A Distributed operating system is an operating system. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the distributed system. This idea is synonymous to the consideration of a square. A square might not immediately be recognized as a rectangle. Although possessing all requisite attributes defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular configuration make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, monolithic operating system. That is, although distributed in nature, it supports the system’s appearance as a singular, local entity.
An operating system, at a basic level, is expected to isolate and manage the physical complexities of lower-level hardware resources. In turn, these complexities are organized into simplified logical abstractions and presented to higher-level entities as interfaces into the underlying resources. These marshalling and presentation activities take place in a secure and protected environment, often referred to as the “system-level,” and describe a minimal scope of practical operating system functionality. In graphical depictions however, most monolithic operating systems would be illustrated as a discrete container sandwiched between the local hardware resources below and application programs above. The operating system container would be filled with a robust compliment of services and functions to support as many potential needs as possible or practical. This full-featured collection of services would reside and execute at the system-level and support higher, “user-level” applications and services.
A distributed operating system, illustrated in a similar fashion, would be a container suggesting minimal operating system functionality and scope. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “separation of mechanism and policy.” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed system.
Distributed computing models
The nature of distribution
The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is a distributed system, not simply decentralized.
This distinction is the source of the subtlety and complexity. While decentralized systems and distributed systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of communication between the nodes of the system.
Three basic distributions
To better illustrate this point, let us more closely reflect upon these three system architectures; centralized, decentralized, and distributed. In this examination, we will consider three tightly-related aspects of their structure: organization, connection, and control. Organization will describe physical arrangement characteristics, connection will involve associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier considerations.
Organization
Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more federated structure, multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements.
Connection
Association linkages between elements will be the second consideration. In each case, physical association is inextricably linked (or not), to conceptual organization. The centralized system has its constituent members directly united to a central entity. One could conceptualize holding a bunch of balloons -- each on a string, -- with the hand being the central figure. A decentralized system incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a corporate organizational chart, the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Think of the 1970’s phenomena of “string art,” a spirograph drawing, a spider’s web, or the Interstate Highway System between U.S. cities.
Control
Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article.
Conclusions
Lastly, as to the nature of the distributed system, it has been stated that a distributed operating system is not necessarily an operating system at all; but simply "is" the distributed system. This view is commonly justified by pointing to the deep and inextricable integration into the distributed system. The absolute and singular focus of sustaining and maintenance of the system is also used as rationale. However, it is important to remember the separation of mechanism and policy. The distributed operating system and its mechanism is not affected by any degree of integration, and no amount of focus on providing this mechanism changes the responsibility of policy, or expectation of results at the distributed system level. As mentioned earlier, a square is a rectangle; and no level of effort exerted by the square in maintaining four equivalent dimensions changes anything.
Major Design Considerations
Transparency
Transparency, simply put, is the quality of a distributed system to be seen and understood as a single-system image; and by far the greatest overriding consideration in the high-level conceptual design of a distributed operating system. While a simple concept, this one issue touches and affects decision making in almost every aspect of design by introducing requirements and/or restrictions on those aspects and often in their relationships with others. Inter-Process Communication (IPC) is the critical complement to transparency, as low-level IPC implementation considerations. General communications, process interactions, and data flows all depend on IPC sub-systems. Each situation requires fast, efficient, and reliable exchange capabilities; requiring both efficient primitives and stable protocol. And while this often leads to various scenario-specific solutions, the calling interface must be consistent.
Process Management
Process management is a global system concept, which provides mechanisms for effective and efficient use and sharing of processing resources throughout the system. These resources, and operations on them, can be either local or remote; however, in either event, they must remain completely consistent from the user perspective. As an example, Load Balancing is an important process management function. Some of the questions involved are which process to move, and when and where to move it. These are Policy decisions relegated to Resource Management; but, the migration of the process (ex. moveProcess(fromA, toB) is a mechanism implementation of Process Management. The migration process, either local to another core or remote to another computer, again must remain consistent in presentation to the user. Other functions of this sub-system include the allocation and de-allocation of processes and ports, as well as provisions to run, suspend, and resume execution of a process. Again, these are mechanisms, related only to "What" is done, not which one, how, or where.
Resource Management
Systems resources such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. Load sharing and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many algorithms exist to aid in these decisions; however, this calls for a second-level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.
Reliability
One of the basic tenants of distributed systems is a high-level of reliability. This quality attribute of a distributed system has become a staple expectation. Reliability is most often considered from the perspectives of availability and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. Faults are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults. There are four general methods for dealing with faults: fault avoidance, fault tolerance, and fault detection and recovery. Fault avoidance are proactive measures taken to minimize the occurrence of faults, and fault tolerance is the ability of a system to continue some level operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality.
Performance
Performance is arguably the quintessential computing concern, and in the distributed system, it is no different. Many benchmark metrics exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed system, this consideration most often distills to a balance between process parallelism and IPC. Managing the task granularity of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to migrate a process to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance.
Synchronization
Cooperating concurrent processes have an inherent need for synchronization. Three basic situations that define the scope of this need; one or more processes must synchronize at a given point for one or more other processes to continue, one or more processes must wait for an asynchronous condition in order to continue, or a process must establish mutual exclusive access to a shared resource. There is a multitude of algorithms available for these scenarios, and their many variations. Unfortunately, whenever synchronization is required the opportunity for process deadlock usually exists. The ancillary situation of deadlock is covered below.
Flexibility
Flexibility in a distributed system is made possible through the modular characteristics of the microkernel. With the microkernel presenting a minimal -- but complete -- set of primitives and basic functionally cohesive services, The higher-level management components can be composed in a similar functionally cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional of components above the kernel.
Transparency Responsibilities
Location Transparency
System should create and maintain the user's perception and understanding of the entirety of the system, its devices, and resources as local entities. At no point in any user's system experience should there exist any expectation of any user to be
Access Transparency
System entities or processes maintain consistent access/entry mechanism, regardless of being local or remote
Migration Transparency
Resources and processes can be migrated, without user-knowledge, by the system to another node in an attempt to maximize efficiency, reliability, and security. Requires policy decision-making abilities, Naming stability, and in the event of a process migration, all IPC messages must be received or held pending the migration.
Replication Transparency
Systems entities can be copied to strategic points in the system to increase efficiencies through better proximity, and also provide for improved reliability through the distributed replication as a back-up; prompted by dynamic stratagem.
Concurrency Transparency
System should possess and exhibit properties to allow multiple simultaneous uses of system resources between users ho are kept unaware of the concurrent usage. Required properties are synchronization mechanisms to keep events ordered and consistent, mutual-exclusivity management for resources, sufficient capabilities to detect and recover from both starvation and deadlock.
Parallel Transparency
System should have stable performance characteristics, regardless if some nodes increase rapidly in workload, through properties of migration, replication, and concurrency. This requires an intelligent policy decision stratagem to facilitate the timely and accurate allocation, migration, and disposition of resources.
Failure Transparency
The system should shield users from the knowledge of and the affects resulting from failures. In the event of a partial failure, the system is responsible for rapid and accurate detection and orchestration of a remedy with little, if any imposition on users. These methods can range from static proactive posturing to dynamic and more flexible response mechanisms.
Perform Transparency
System should create and maintain a reasonable, stable, and predictable performance expectation for the user, that is both resilient from and helpful in situations where parts of the system may experience significant delay or even failure. While reasonable and predictable are important, there should be no inherent expectation or expressed indication of fairness or equality.
Name Transparency
All system entities should maintain a complete decoupling between entity naming from any spatial or temporal ___location, as well as any other system entity.
Size/Scale Transparency
A user's experience or perception of their system should remain stable and consistent in the face of system extension, scaling, or waning due to failure.
Revision Transparency
System users should be completely oblivious to system-software version changes and changes in internal implementation of system infrastructure. While a user may become aware of, or discover the availability of a new function or service, the implementation or alteration of the systems internal structure should in no way be the prompt for this discovery.
Control Transparency
All system constants, properties, configuration settings, etc. should be completely consistent in appearance, connotation, and denotation to all users and software applications aware of them.
Data Transparency
No system data-entity should expose itself as peculiar when required to interact remotely.
Historical Perspectives
Pioneering inspirations
With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.[12][13][14][15][16][17] While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the 20th century, many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments.
Aboriginal Distributed Computing
The DYSEAC[18] (1954)
One of the first solutions to these new questions was the DYSEAC, a self-described general-purpose synchronous computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the ACM, in April of 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, magnetic medium, and CRTs, and with the term “input-output operation” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system.
Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.
— ALAN L. LEINER, System Specifications for the DYSEAC
While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged.
Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.
— ALAN L. LEINER, System Specifications for the DYSEAC
This is one of the earliest examples of a computer with distributed control. Dept. of the Army reports[19] show it was certified reliable and passed all acceptance tests in April of 1954. It was completed and delivered on time, in May of 1954. In addition, was it mentioned that this was a portable computer? It was housed in tractor-trailer, and had 2 attendant vehicles and 6 tons of refrigeration capacity.
Multi-programming abstraction
The Lincoln TX-2[20] (1957)
Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique.
This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices.
Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control.
Memory access abstraction
Intercommunicating Cells, Basis for a Distributed Logic Computer[21] (1962)
One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of memory elements. A memory element was basically a electronic flip-flop or relay, capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores data in a string of symbols, consisting of a name and a set of associated parameters. Consequently, a system's information is linked through various associations of cells.
Intercommunicating Cells fundamentally break from tradition in that it has no counters or any concept of addressing memory. The theory contends that addressing is a wasteful and non-valuable level of indirection. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval projects through parameter sets and returns a set of names containing the given subset of parameters. This would be similar to a modified hash table data structure that would allow for multiple values (parameters) for each key (name).
Cellular memory would have many advantages: | ||
A major portion of a system's logic is distributed within the associations of information stored in the cells, | ||
This flow of information association is somewhat guided by the act of storing and retrieving, | ||
The time required for storage and retrieval is mostly constant and completely unrelated to the size and fill-factor of the memory | ||
Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size |
This early research into alternative memory describes a configuration ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently atomic and exclusive. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the user, hardware/device, or Application programming interfaces is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state:
We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.
— Chung-Yeol (C. Y.) Lee, Intercommunicating Cells, Basis for a Distributed Logic Computer
Component abstraction
HYDRA:The Kernel of a Multiprocessor Operating System[22] (1974)
The design philosophy of HYDRA ... suggest that, at the heart of the system, one should build a collection of facilities of "universal applicability" and "absolute reliability" -- a set of mechanisms from which an arbitrary set of operating system facilities and policies can be conveniently, flexibly, efficiently, and reliably constructed.
Defining a kernel with all the attributes given above is difficult, and perhaps impractical... It is, nevertheless, the approach taken in the HYDRA system. Although we make no claim either that the set of facilities provided by the HYDRA kernel ... we do believe the set provides primitives which are both necessary and adequate for the construction of a large and interesting class of operating environments. It is our view that the set of functions provided by HYDRA will enable the user of C.mmp to create his own operating environment without being confined to predetermined command and file systems, execution scenarios, resource allocation policies, etc.
Initial composition
The National Software Works: A Distributed Processing System[23] (1975)
The National Software Works (NSW) is a significant new step in the development of distributed processing systems and computer networks. NSW is an ambitious project to link a set of geographically distributed and diverse hosts with an operating system which appears as a single entity to a prospective user.
Complete instantiation
The Rosco Distributed Operating System[24] (1979)
Roscoe is an operating system implemented at the University of Wisconsin that allows a network of microcomputers to cooperate to provide a general-purpose computing facility. The goal of the Roscoe network is to provide a general-purpose computation resource in which individual resources such as files and processors are shared among processes and control is distributed in a non-hierarchical fashion. All processors are identical. Similarly, all processors run the same operating system kernel. However, they may differ in the peripheral units connected to them. No memory is shared between processors. All communication involves messages explicitly passed between physically connected processors. No assumptions are made about the topology of interconnection.
The decision not to use logical or physical sharing of memory for communication is influenced both by the constraints of currently available hardware and by our perception of cost bottlenecks likely to arise as the number of processors increases.
Foundational Work
Coherent memory abstraction
Algorithms for scalable synchronization on shared-memory multiprocessors[25]
A N algorithm for mutual exclusion in decentralized systems[26]
File System abstraction
Measurements of a distributed file system[27]
Memory coherence in shared virtual memory systems[28]
Transaction abstraction
Transactions
Sagas[29]
Transactional Memory
Composable memory transactions[30]
Transactional memory: architectural support for lock-free data structures[31]
Software transactional memory for dynamic-sized data structures[32]
Software transactional memory[33]
Persistence abstraction
OceanStore: an architecture for global-scale persistent storage[34]
Coordinator abstraction
Weighted voting for replicated data[35]
Consensus in the presence of partial synchrony[36]
Reliability abstraction
Sanity checks
The Byzantine Generals Problem[37]
Fail-stop processors: an approach to designing fault-tolerant computing systems[38]
Recoverability
Distributed snapshots: determining global states of distributed systems[39]
Optimistic recovery in distributed systems[40]
Current Research
replicated model extended to a component object model
Architectural Design of E1 Distributed Operating System[41]
The Cronus distributed operating system[42]
Fine-grained mobility in the emerald system[43]
Design and development of MINIX distributed operating system[44]
Future Directions
Systems able to provide low-level complexity exposure, in proportion to trust and accepted responsibility
Application performance and flexibility on exokernel systems.[45]
Scale and performance in the Denali isolation kernel.[46]
Infrastructures focused on multi-processor/core processing
The multikernel: a new OS architecture for scalable multicore systems.[47]
Corey: an Operating System for Many Cores.[48]
Systems extending a consistent and stable impression of distributed processing over extremes in heterogeneity
Helios: heterogeneous multiprocessing with satellite kernels.[49]
Systems able to provide effective, stable, and beneficial views of vastly increased complexity on multiple levels
Tesselation
See Also
- Coming Soon...
References
- ^ a b Tanenbaum, Andrew S. 1993 Distributed operating systems anno 1992. What have we learned so far? Distributed Systems Engineering, 1, 1 (1993), 3-10
- ^ Nutt, G. J. 1992 Centralized and Distributed Operating Systems. Prentice Hall Press.
- ^ a b c Distributed Operating Systems: The Logical Design, 1st edition Goscinski, A. 1991 Distributed Operating Systems: the Logical Design. 1st. Addison-Wesley Longman Publishing Co., Inc.
- ^ Fortier, P. J. 1986 Design of Distributed Operating Systems: Concepts and Technology. Intertext Publications, Inc., McGraw-Hill, Inc.
- ^ P. Brinch Hansen, Ed. 2000 Classic Operating Systems: from Batch Processing to Distributed Systems. Springer-Verlag New York, Inc.
- ^ Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.
- ^ COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.
- ^ a b Distributed Operating Systems: Concepts and Design Sinha, P. K. 1996 Distributed Operating Systems: Concepts and Design. 1st. Wiley-IEEE Press.
- ^ Distributed Operating Systems Galli, D. L. 1999 Distributed Operating Systems: Concepts and Practice. 1st. Prentice Hall PTR.
- ^ a b Distributed Operating Systems and Algorithms Chow, R. and Chow, Y. 1997 Distributed Operating Systems and Algorithms. Addison-Wesley Longman Publishing Co., Inc.
- ^ Surajbali, B., Coulson, G., Greenwood, P., and Grace, P. 2007. Augmenting reflective middleware with an aspect orientation support layer. In Proceedings of the 6th international Workshop on Adaptive and Reflective Middleware: Held At the ACM/IFIP/USENIX international Middleware Conference (Newport Beach, CA, November 26 - 30, 2007). ARM '07. ACM, New York, NY, 1-6.
- ^ Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6-8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133.
- ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57
- ^ Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.
- ^ Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.
- ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.
- ^ Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3-5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.
- ^ Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.
- ^ Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961
- ^ Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26-28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26 - 28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.
- ^ Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4-6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).
- ^ Wulf, W., Cohen, E., Corwin, W., Jones, A., Levin, R., Pierson, C., and Pollack, F. 1974. HYDRA: the kernel of a multiprocessor operating system. Commun. ACM 17, 6 (Jun. 1974), 337-345.
- ^ Millstein, R. E. 1977. The National Software Works: A distributed processing system. In Proceedings of the 1977 Annual Conference ACM '77. ACM, New York, NY, 44-52.
- ^ Solomon, M. H. and Finkel, R. A. 1979. The Roscoe distributed operating system. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79.
- ^ Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.
- ^ Maekawa, M. 1985. A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May. 1985), 145-159.
- ^ Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13 - 16, 1991). SOSP '91. ACM, New York, NY, 198-212.
- ^ Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.
- ^ Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27 - 29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.
- ^ Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15 - 17, 2005). PPoPP '05. ACM, New York, NY, 48-60.
- ^ Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16 - 19, 1993). ISCA '93. ACM, New York, NY, 289-300.
- ^ Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM, New York, NY, 92-101.
- ^ Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottowa, Ontario, Canada, August 20 - 23, 1995). PODC '95. ACM, New York, NY, 204-213.
- ^ Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.
- ^ Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79. ACM, New York, NY, 150-162
- ^ Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.
- ^ Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.
- ^ Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.
- ^ Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.
- ^ Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3
- ^ L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.
- ^ Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.
- ^ Jul, E., Levy, H., Hutchinson, N., and Black, A. 1987. Fine-grained mobility in the emerald system. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (Austin, Texas, United States, November 08 - 11, 1987). SOSP '87. ACM, New York, NY, 105-106.
- ^ Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.
- ^ M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. In the Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malô, France, October 1997.
- ^ Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation
- ^ Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.
- ^ S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.
- ^ Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.
Further Reading
- Coming Soon...
External links
- Coming Soon...