Distributed operating system

This is an old revision of this page, as edited by JLSjr (talk | contribs) at 09:58, 15 April 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.




Near final draft of Lead/Introduction
   JLSjr (talk) 09:56, 14 April 2010 (UTC)
Again, restructured Lead/Intro sections prepatory to implementing the "body"
   JLSjr (talk) 10:34, 8 April 2010 (UTC)
Rebuilt lead, in preparation of "Architectural features" overhaul
   JLSjr (talk) 10:37, 7 April 2010 (UTC)
Updated much previously in lead, into "Description" section
   JLSjr (talk) 06:33, 30 March 2010 (UTC)
Updated lead section
   JLSjr (talk) 08:45, 28 March 2010 (UTC)
Added draft of Memory access abstraction, third section of "History"...
   JLSjr (talk) 07:30, 25 March 2010 (UTC)
Added drafts of Transparency and Modularity, under "Architectural features"...
   JLSjr (talk) 06:09, 24 March 2010 (UTC)
Added draft of the second section of "History"...
   JLSjr (talk) 05:44, 23 March 2010 (UTC)
Added draft of the first section of "History"...
   JLSjr (talk) 05:32, 18 March 2010 (UTC)
Added draft of "Overview" section...
   JLSjr (talk) 06:45, 17 March 2010 (UTC)
Added draft of "Lead" section...
   JLSjr (talk) 09:02, 16 March 2010 (UTC)
Added "Introduction" outline; framework of text to come...
   JLSjr (talk) 04:56, 15 March 2010 (UTC)
Added initial entry...
   JLSjr (talk) 12:48, 13 March 2010 (UTC)


A Distributed operating system is the logical cumulative aggregation of operating system software within a Distributed System. The distributed operating system – considered collectively – is the foundation for coordinated operation of the distributed system’s independent and autonomous computational nodes.[1] Individual system nodes each contain a discrete subset of the global system’s operating system software. A given node’s system software set reveals a clean division – both physically and logically – between two distinct providers of services.[2]

The first is a minimal, low-level, node-servicing kernel, situated directly above the bare-metal of a node’s hardware. The kernel provides the foundation for all node-level activities. The second is a higher-level collection of system-servicing management components and services, the System Management Servers. This collection of globally-connected management components exists immediately above the microkernel, and below any user applications or APIs that might reside at higher levels.[3] These two entities, the kernel and the management components collection, work together in supporting the distributed operating system’s goal of seamlessly integrating all network-connected resources and functionality into an efficient, available, and unified system.[4]


Overview

The Kernel

The Kernel is a minimal, but complete set of node-level utilities necessary for access to a node’s underlying hardware and resources. These mechanisms provide the complete set of “building-blocks” essential for node operation; mainly low-level allocation, management, and disposition of a node’s resources, processes, communication, and I/O management support functions. These functions are made possible by exposing a concise, yet comprehensive array of primitive mechanisms and services. The kernel is arguably the primary consideration in a distributed operating system; however, within the kernel, the subject of foremost importance is that of a well-structured and highly-efficient communications sub-system.[3]

 A Diagram will be furnished to assist in illustration of this idea.

In a distributed operating system, the kernel is often defined by a relative to absolute minimal architecture. A Kernel of this design is referred to as a Microkernel.[5] [6] The microkernel usually contains only mechanisms and services which would, if otherwise removed, render a node or the global system functionally incapable. The minimal nature of the microkernel strongly enhances a distributed operating system’s modular potential.[7] It is generally the case that the kernel is implemented directly on the bare metal of a node’s hardware; it is also common for a kernel to be replicated over all nodes.[8]

 A Diagram will be furnished to assist in illustration of this idea.

A well-devised minimal microkernel of functionally-cohesive modular architecture can exhibit an advanced level of flexibility in adapting to heterogeneous hardware, and in supporting different organizational paradigms in higher-level structures; all from a single replicated entity. This ubiquitous quality of a system’s kernel also supports much greater system-level flexibility and scalability. The combination of a kernel’s minimal design and ubiquitous coverage greatly aids in global system extensibility, and the ability to dynamically introduce new nodes or services.[9]

System Management Components

A node’s management server collection is defined by the composite of a node’s system software not directly required within the kernel, which support the node’s responsibilities to the overall system. These responsibilities focus principally on transparency or the “system image”; and subsequently on achieving the global system goals of efficiency, flexibility, consistency, and reliability. Transparency, with respect to the traditional operating system is the abstraction of difficult or tedious aspect of a system into a more acceptable or desirable quality.

 A Diagram will be furnished to assist in illustration of this idea.

In a distributed system, the exceptional degree of inherent complexity could easily render the system and its operating system an anathema to any user. As a result, transparency is a critical point of focus in most, if not all areas of the system; namely with respect to performance, failure, access, migration, concurrency, and ___location to mention just a few. Quite often, an effort to realize success in any particular area illuminates conflict with efforts in others. Therefore, a consistent and balanced perspective and understanding of the overall system can help identify points of the diminishing returns quickly.

Together as an Operating System

The architecture and design of a distributed operating system is specifically aligned with realizing these goals and transparencies in an attempt to protect the user from the issues arising from the system’s physically separated state. Simply said, a distributed operating system attempts to provide a highly efficient and reliable computing framework with a minimum user awareness of the underlying command and control efforts. The multi-level collaboration between a kernel and management components, and in turn between distinct nodes in a distributed system is the key functional space of the distributed operating system. However, this opportunity comes at a very high price.

 A Diagram will be furnished to assist in illustration of this idea.

The logical price of realizing a distributed system – including its operating system – must be calculated in terms of overcoming vast amounts of complexity on many levels, and in many areas. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even modest levels of success. These development considerations are critical and unforgiving, as the overwhelming majority of a distributed system’s details require an inordinate completeness of understanding from the start. As an aid in this effort, most rely strongly on the immense amount of documented experience and research accomplished towards distributed computing which exists, and continues today.

Many notable experts look to the early 1970s for the earliest distributed systems, complete by definition and capable of being considered and implemented wholly. Research and experimentation efforts began in earnest in the mid to late-1970s and continued into the early 1990s, with a few implementations achieving modest commercial success. The subject of distributed operating systems however, has a much richer historical perspective when considering design issues severally with respect to some of the individual primordial strides towards distributed computing. There are several instances of fundamental and pioneering implementations of primitive distributed system and component concepts dating back to the early 1950s. Looking to the modern distributed system and its future, the accelerating proliferation of multiprocessor systems and multi-core processors has led to a re-emergence of the distributed system concept. The inherent challenges in many-core and multiprocessor science has led to an enormous increase in distributed system related research. Many of these research efforts investigate and describe interesting and plausible paradigms for the future of distributed computing.


Description

A Distributed operating system is an operating system. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the distributed system. This idea is synonymous to the consideration of a square. A square might not immediately be recognized as a rectangle. Although possessing all requisite attributes defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular configuration make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, monolithic operating system. That is, although distributed in nature, it supports the system’s appearance as a singular, local entity.

 A Diagram will be furnished to assist in illustration of this idea.

An operating system, at a basic level, is expected to isolate and manage the physical complexities of lower-level hardware resources. In turn, these complexities are organized into simplified logical abstractions and presented to higher-level entities as interfaces into the underlying resources. These marshalling and presentation activities take place in a secure and protected environment, often referred to as the “system-level,” and describe a minimal scope of practical operating system functionality. In graphical depictions however, most monolithic operating systems would be illustrated as a discrete container sandwiched between the local hardware resources below and application programs above. The operating system container would be filled with a robust compliment of services and functions to support as many potential needs as possible or practical. This full-featured collection of services would reside and execute at the system-level and support higher, “user-level” applications and services.

A distributed operating system, illustrated in a similar fashion, would be a container suggesting minimal operating system functionality and scope. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “separation of mechanism and policy.” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed system.

Distributed computing models

The nature of distribution

The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is a distributed system, not simply decentralized.

This distinction is the source of the subtlety and complexity. While decentralized systems and distributed systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of communication between the nodes of the system.

Three basic distributions

To better illustrate this point, let us more closely reflect upon these three system architectures; centralized, decentralized, and distributed. In this examination, we will consider three tightly-related aspects of their structure: organization, connection, and control. Organization will describe physical arrangement characteristics, connection will involve associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier considerations.


 

 

 

Multiple Diagrams

will be furnished to assist

in illustration of these ideas.

Organization

Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more federated structure, multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements.

Connection

Association linkages between elements will be the second consideration. In each case, physical association is inextricably linked (or not), to conceptual organization. The centralized system has its constituent members directly united to a central entity. One could conceptualize holding a bunch of balloons -- each on a string, -- with the hand being the central figure. A decentralized system incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a corporate organizational chart, the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Think of the 1970’s phenomena of “string art,” a spirograph drawing, a spider’s web, or the Interstate Highway System between U.S. cities.

Control

Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article.

Conclusions

Lastly, as to the nature of the distributed system, it has been stated that a distributed operating system is not necessarily an operating system at all; but simply "is" the distributed system. This view is commonly justified by pointing to the deep and inextricable integration into the distributed system. The absolute and singular focus of sustaining and maintenance of the system is also used as rationale. However, it is important to remember the separation of mechanism and policy. The distributed operating system and its mechanism is not affected by any degree of integration, and no amount of focus on providing this mechanism changes the responsibility of policy, or expectation of results at the distributed system level. As mentioned earlier, a square is a rectangle; and no level of effort exerted by the square in maintaining four equivalent dimensions changes anything.

Architectural features

Transparency

Transparency is the attribute of a distributed operating system allowing it to appear as a unified, centralized, and local operating system. Many factors lend complexity to the concept of transparency in a distributed operating system (a system). Elements of a system are distributed spatially; a system’s software, its processes, and data are also distributed among these elements. Occasionally, elements need to communicate with other distant elements in the system. When a process asks a question of another process, it should not stand idly waiting for the answer; it should continue working productively. However, it should also remain alert for the answer; and receive it and process it immediately, to maintain the illusion of local elements. This added level of complexity is asynchronous communication. Communication time can become indefinite, when an element's connectivity is compromised, or an element itself fails. Connectivity and failure issues affect communication, but system processing is affected as well.

To remain transparent, a system's elements may copy (replicate) portions of themselves onto collections of host elements. In times of need, a failed element's information can be retrieved from these host elements to continue processing, and eventually reconstitute the faulty element. This too is added complexity, and it does not end here. This replication of information throughout the system requires coordination, and therefore a coordinator. The coordinator oversees many aspects of a system's operation, unless that coordinator fails. In this event, some other element must be chosen and constituted a coordinator. This process adds complexity to the system. The complexity in the system can quickly add up, and these examples by no means sum to a total. Transparency envelope a system in an abstraction of extremely complex construction; but provide a user with a complete, consistent, and simplified local interface to hardware, devices, and resources. The various facets of a system contributing to this complexity are discussed individually, below.

Modularity

A distributed operating system is inherently modular by definition. However, a system's modularity speaks more to its composition and configuration, the rationale behind these, and ultimately their effectiveness. A system element could be composed of multiple layers of components. Each of these components might vary in granularity of subcomponent. These layers and component compositions would each have a coherent and rational configuration towards some purpose in the system. The purpose could be for a more simplified abstraction, raw communication efficiency, accommodating heterogeneous elements, processing parallelism and concurrency, or possibly to support an object-oriented programming paradigm. In any event, the scattered distribution of system elements is not random, but is most often the result of detailed design and careful planning.

Persistence of Entity state

 existance not time-bound, regardless of breaks in system functions continuously
 resides in nonvolatile storage; synchronized with current, stable, active copy
 Subject to consistent and timely updates
 Able to survive hardware failure

Efficiency

 Many issues can adversly affect system performance:
 latency in interactions among distributed entities
 local response facade requires remote entities' state be cached locally
 and consistently synchronized to maintain the paradigm
 Workload variations, delays, interruptions, faults, and/or crashes of entities
 Distributed processing community assists when needed

Replication

 Duplication of state among selected distributed entities, and the synchronization of that state
 Remote communication required to effect synchronization

Reliability

 Inherent redundancy across the distributed entities provides fault-tolerance
 Consistent synchronized redundancy across N nodes, tolerates up to N-1 node faults

Flexibility

 OS has lattitude in degree of exposure to externals
 Externals have lattitude in degree of exposure they accept
 Coordination of process activity
 Where run; Near user?, resources?, avail. CPU?, etc...

Scalability

 node expansion
 process migration

History

Pioneering inspirations

With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.[10][11][12][13][14][15] While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the 20th century, many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments.

Aboriginal Distributed Computing

The DYSEAC[16] (1954)

One of the first solutions to these new questions was the DYSEAC, a self-described general-purpose synchronous computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the ACM, in April of 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, magnetic medium, and CRTs, and with the term “input-output operation” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system.

Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.

— ALAN L. LEINER, System Specifications for the DYSEAC

While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged.

Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.

— ALAN L. LEINER, System Specifications for the DYSEAC

This is one of the earliest examples of a computer with distributed control. Dept. of the Army reports[17] show it was certified reliable and passed all acceptance tests in April of 1954. It was completed and delivered on time, in May of 1954. In addition, was it mentioned that this was a portable computer? It was housed in tractor-trailer, and had 2 attendant vehicles and 6 tons of refrigeration capacity.

Multi-programming abstraction

The Lincoln TX-2[18] (1957)

Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique.

This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices.

Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control.

Memory access abstraction

Intercommunicating Cells, Basis for a Distributed Logic Computer[19] (1962)

One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of memory elements. A memory element was basically a electronic flip-flop or relay, capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores data in a string of symbols, consisting of a name and a set of associated parameters. Consequently, a system's information is linked through various associations of cells.

Intercommunicating Cells fundamentally break from tradition in that it has no counters or any concept of addressing memory. The theory contends that addressing is a wasteful and non-valuable level of indirection. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval projects through parameter sets and returns a set of names containing the given subset of parameters. This would be similar to a modified hash table data structure that would allow for multiple values (parameters) for each key (name).

Cellular memory would have many advantages:
  A major portion of a system's logic is distributed within the associations of information stored in the cells,
  This flow of information association is somewhat guided by the act of storing and retrieving,
  The time required for storage and retrieval is mostly constant and completely unrelated to the size and fill-factor of the memory
  Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size

This early research into alternative memory describes a configuration ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently atomic and exclusive. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the user, hardware/device, or Application programming interfaces is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state:

We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.

— Chung-Yeol (C. Y.) Lee, Intercommunicating Cells, Basis for a Distributed Logic Computer

Component abstraction

HYDRA:The Kernel of a Multiprocessor Operating System[20] (1974)
The design philosophy of HYDRA ... suggest that, at the heart of the system, one should build a collection of facilities of "universal applicability" and "absolute reliability" -- a set of mechanisms from which an arbitrary set of operating system facilities and policies can be conveniently, flexibly, efficiently, and reliably constructed.
Defining a kernel with all the attributes given above is difficult, and perhaps impractical... It is, nevertheless, the approach taken in the HYDRA system. Although we make no claim either that the set of facilities provided by the HYDRA kernel ... we do believe the set provides primitives which are both necessary and adequate for the construction of a large and interesting class of operating environments. It is our view that the set of functions provided by HYDRA will enable the user of C.mmp to create his own operating environment without being confined to predetermined command and file systems, execution scenarios, resource allocation policies, etc.

Initial composition

The National Software Works: A Distributed Processing System[21] (1975)

The National Software Works (NSW) is a significant new step in the development of distributed processing systems and computer networks. NSW is an ambitious project to link a set of geographically distributed and diverse hosts with an operating system which appears as a single entity to a prospective user.

Complete instantiation

The Rosco Distributed Operating System[22] (1979)

Roscoe is an operating system implemented at the University of Wisconsin that allows a network of microcomputers to cooperate to provide a general-purpose computing facility. The goal of the Roscoe network is to provide a general-purpose computation resource in which individual resources such as files and processors are shared among processes and control is distributed in a non-hierarchical fashion. All processors are identical. Similarly, all processors run the same operating system kernel. However, they may differ in the peripheral units connected to them. No memory is shared between processors. All communication involves messages explicitly passed between physically connected processors. No assumptions are made about the topology of interconnection.

The decision not to use logical or physical sharing of memory for communication is influenced both by the constraints of currently available hardware and by our perception of cost bottlenecks likely to arise as the number of processors increases.

Foundational Work

Coherent memory abstraction

 Algorithms for scalable synchronization on shared-memory multiprocessors[23]
 A N algorithm for mutual exclusion in decentralized systems[24]

File System abstraction

 Measurements of a distributed file system[25]
 Memory coherence in shared virtual memory systems[26]

Transaction abstraction

 Transactions
 Sagas[27]

 Transactional Memory
 Composable memory transactions[28]
 Transactional memory: architectural support for lock-free data structures[29]
 Software transactional memory for dynamic-sized data structures[30]
 Software transactional memory[31]

Persistence abstraction

 OceanStore: an architecture for global-scale persistent storage[32]

Coordinator abstraction

 Weighted voting for replicated data[33]
 Consensus in the presence of partial synchrony[34]

Reliability abstraction

 Sanity checks
 The Byzantine Generals Problem[35]
 Fail-stop processors: an approach to designing fault-tolerant computing systems[36]

 Recoverability
 Distributed snapshots: determining global states of distributed systems[37]
 Optimistic recovery in distributed systems[38]

Current Research

replicated model extended to a component object model

 Architectural Design of E1 Distributed Operating System[39]
 The Cronus distributed operating system[40]
 Fine-grained mobility in the emerald system[41]
 Design and development of MINIX distributed operating system[42]

Future Directions

Systems able to provide low-level complexity exposure, in proportion to trust and accepted responsibility

 Application performance and flexibility on exokernel systems.[43]
 Scale and performance in the Denali isolation kernel.[44]

Infrastructures focused on multi-processor/core processing

 The multikernel: a new OS architecture for scalable multicore systems.[45]
 Corey: an Operating System for Many Cores.[46]

Systems extending a consistent and stable impression of distributed processing over extremes in heterogeneity

 Helios: heterogeneous multiprocessing with satellite kernels.[47]

Systems able to provide effective, stable, and beneficial views of vastly increased complexity on multiple levels

 Tesselation

See Also

  • Coming Soon...

References

  1. ^ Tanenbaum, Andrew S. 1993 Distributed operating systems anno 1992. What have we learned so far? Distributed Systems Engineering, 1, 1 (1993), 3-10
  2. ^ Nutt, G. J. 1992 Centralized and Distributed Operating Systems. Prentice Hall Press.
  3. ^ a b Distributed Operating Systems: The Logical Design, 1st edition Goscinski, A. 1991 Distributed Operating Systems: the Logical Design. 1st. Addison-Wesley Longman Publishing Co., Inc.
  4. ^ Design of Distributed Operating Systems: Concepts and Technology Fortier, P. J. 1986 Design of Distributed Operating Systems: Concepts and Technology. Intertext Publications, Inc.,/McGraw-Hill, Inc.
  5. ^ Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.
  6. ^ COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.
  7. ^ Distributed Operating Systems: Concepts and Design Sinha, P. K. 1996 Distributed Operating Systems: Concepts and Design. 1st. Wiley-IEEE Press.
  8. ^ Distributed Operating Systems Galli, D. L. 1999 Distributed Operating Systems: Concepts and Practice. 1st. Prentice Hall PTR.
  9. ^ Distributed Operating Systems and Algorithms Chow, R. and Chow, Y. 1997 Distributed Operating Systems and Algorithms. Addison-Wesley Longman Publishing Co., Inc.
  10. ^ Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6-8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133.
  11. ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57
  12. ^ Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.
  13. ^ Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.
  14. ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.
  15. ^ Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3-5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.
  16. ^ Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.
  17. ^ Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961
  18. ^ Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26-28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26 - 28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.
  19. ^ Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4-6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).
  20. ^ Wulf, W., Cohen, E., Corwin, W., Jones, A., Levin, R., Pierson, C., and Pollack, F. 1974. HYDRA: the kernel of a multiprocessor operating system. Commun. ACM 17, 6 (Jun. 1974), 337-345.
  21. ^ Millstein, R. E. 1977. The National Software Works: A distributed processing system. In Proceedings of the 1977 Annual Conference ACM '77. ACM, New York, NY, 44-52.
  22. ^ Solomon, M. H. and Finkel, R. A. 1979. The Roscoe distributed operating system. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79.
  23. ^ Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.
  24. ^ Maekawa, M. 1985. A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May. 1985), 145-159.
  25. ^ Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13 - 16, 1991). SOSP '91. ACM, New York, NY, 198-212.
  26. ^ Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.
  27. ^ Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27 - 29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.
  28. ^ Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15 - 17, 2005). PPoPP '05. ACM, New York, NY, 48-60.
  29. ^ Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16 - 19, 1993). ISCA '93. ACM, New York, NY, 289-300.
  30. ^ Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM, New York, NY, 92-101.
  31. ^ Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottowa, Ontario, Canada, August 20 - 23, 1995). PODC '95. ACM, New York, NY, 204-213.
  32. ^ Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.
  33. ^ Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79. ACM, New York, NY, 150-162
  34. ^ Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.
  35. ^ Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.
  36. ^ Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.
  37. ^ Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.
  38. ^ Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3
  39. ^ L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.
  40. ^ Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.
  41. ^ Jul, E., Levy, H., Hutchinson, N., and Black, A. 1987. Fine-grained mobility in the emerald system. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (Austin, Texas, United States, November 08 - 11, 1987). SOSP '87. ACM, New York, NY, 105-106.
  42. ^ Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.
  43. ^ M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. In the Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malô, France, October 1997.
  44. ^ Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation
  45. ^ Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.
  46. ^ S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.
  47. ^ Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.

Further Reading

  • Coming Soon...
  • Coming Soon...