Distributed operating system

This is an old revision of this page, as edited by JLSjr (talk | contribs) at 10:37, 7 April 2010. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.




Rebuilt lead, in preparation of "Architectural features" overhaul
   JLSjr (talk) 10:37, 7 April 2010 (UTC)
Updated much previously in lead, into "Description" section
   JLSjr (talk) 06:33, 30 March 2010 (UTC)
Updated lead section
   JLSjr (talk) 08:45, 28 March 2010 (UTC)
Added draft of Memory access abstraction, third section of "History"...
   JLSjr (talk) 07:30, 25 March 2010 (UTC)
Added drafts of Transparency and Modularity, under "Architectural features"...
   JLSjr (talk) 06:09, 24 March 2010 (UTC)
Added draft of the second section of "History"...
   JLSjr (talk) 05:44, 23 March 2010 (UTC)
Added draft of the first section of "History"...
   JLSjr (talk) 05:32, 18 March 2010 (UTC)
Added draft of "Overview" section...
   JLSjr (talk) 06:45, 17 March 2010 (UTC)
Added draft of "Lead" section...
   JLSjr (talk) 09:02, 16 March 2010 (UTC)
Added "Introduction" outline; framework of text to come...
   JLSjr (talk) 04:56, 15 March 2010 (UTC)
Added initial entry...
   JLSjr (talk) 12:48, 13 March 2010 (UTC)


A Distributed operating system is the composite aggregate of operating system software in a Distributed System. This software – considered collectively – supports the concerted operation of the system’s independent, autonomous, and disseminated computational units. Each discrete unit contains an allocation of operating system software, which generally consists of two distinct portions. One portion, the Kernel, is a minimal subset of system software – usually common to all units – devoted to the control of a unit’s underlying hardware resources and devices; it also supports the unit’s remaining portion of system software. A unit’s remaining system software, the local Operating System, is an ad-hoc collection of discrete system software components, formulated to empower the unit’s operation. The kernel and the local operating system work collaboratively to ensure each unit's cooperative contribution to the whole System. This collective cooperation, sustained at the operating system level, gives the distributed system its unified appearance.

A Diagram will be furnished to assist in illustration of this idea.

As a minimal software composition, a distributed operating system Kernel is often referred to as a Microkernel. The minimal nature of the kernel strongly enhances modular potential. The kernels ubiquitous quality also supports greater opportunity for system flexibility and scalability. Minimal by design, the microkernel usually contains only mechanisms and services which otherwise removed, would render the system incapable. The microkernel primarily provides lower-level resource, process, communication, and I/O management. These services are made possible by the exposure of a comprehensive, yet concise array of primitive mechanisms. Through these primitives, the microkernel supports the higher-level functionality of its local operating system. This clear distinction between the microkernel and local operating system offers a clean separation of mechanism and policy; the difference between what is done, and how or why it is done, respectively.

A unit’s operating system is primarily characterized by its composite nature. This system software composition is dictated mainly by the unit’s responsibilities to the overall system. These responsibilities focus principally on the allocation, management, and disposition of system processes and resources in fulfillment of global system policy. The unit operating system participates largely through the kernel’s communication services in fulfillment of its system-wide responsibilities. This multi-level collaboration between a kernel and a unit operating system, and in turn between the units in a distributed system, is the key function of the distributed operating system. However, this function comes at a very high price.

A Diagram will be furnished to assist in illustration of this idea.

The logical price of distributed system – including its operating system -- is calculated in terms of overcoming vast amounts of complexity on many levels, and in many areas. This calculation also includes the depth and breadth of design investment and architectural planning required in achieving even modest levels of success. These development considerations are critical and unforgiving in that an overwhelming majority of a distributed system’s architectural, design, and implementation details are required from the start. A huge amount of research work has been done in the past towards distributed computing, and continues today.

While the research and implementation efforts came to initial commercial success in the late ‘70s through the mid-‘80s, the distributed system has a much richer historical perspective. There are several instances of interesting, innovative, and almost radical implementations of distributed system components dating back to the mid-‘50s. The emergence and proliferation of multiprocessor systems and multi-core processors has led to a re-emergence of the distributed system, and an enormous increase in quality research. Many of these research papers describe plausible paradigms for the future of distributed computing.

Description

A Distributed operating system is an operating system. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the distributed system. This idea is synonymous to the consideration of a square. A square might not immediately be recognized as a rectangle. Although possessing all requisite attributes defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular configuration make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, monolithic operating system. That is, although distributed in nature, it supports the system’s appearance as a singular, local entity.

 A Diagram will be furnished to assist in illustration of this idea.

An operating system, at a basic level, is expected to isolate and manage the physical complexities of lower-level hardware resources. In turn, these complexities are organized into simplified logical abstractions and presented to higher-level entities as interfaces into the underlying resources. These marshalling and presentation activities take place in a secure and protected environment, often referred to as the “system-level,” and describe a minimal scope of practical operating system functionality. In graphical depictions however, most monolithic operating systems would be illustrated as a discrete container sandwiched between the local hardware resources below and application programs above. The operating system container would be filled with a robust compliment of services and functions to support as many potential needs as possible or practical. This full-featured collection of services would reside and execute at the system-level and support higher, “user-level” applications and services.

A distributed operating system, illustrated in a similar fashion, would be a container suggesting minimal operating system functionality and scope. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “separation of mechanism and policy.” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed system.

Overview

An approach to describing the unique nature of DOS

The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is a distributed system, not simply decentralized.

This distinction is the source of the subtlety and complexity. While decentralized systems and distributed systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of communication between the nodes of the system.

To better illustrate this point, let us more closely reflect upon these three system architectures; centralized, decentralized, and distributed. In this examination, we will consider three tightly-related aspects of their structure: organization, connection, and control. Organization will describe physical arrangement characteristics, connection will involve associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier considerations.


 

 

 

Multiple Diagrams

will be furnished to assist

in illustration of these ideas.

Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more federated structure, multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements.

Association linkages between elements will be the second consideration. In each case, physical association is inextricably linked (or not), to conceptual organization. The centralized system has its constituent members directly united to a central entity. One could conceptualize holding a bunch of balloons -- each on a string, -- with the hand being the central figure. A decentralized system incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a corporate organizational chart, the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Think of the 1970’s phenomena of “string art,” a spirograph drawing, a spider’s web, or the Interstate Highway System between U.S. cities.

Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article.

Lastly, as to the nature of the distributed system, some experts state that the distributed operating system is not an operating system at all; but just a distributed system, because of the attention required in maintenance of the system. This author, and by extension this article, will maintain the operating system status of the distributed operating system, by both observation and vacuous proof. As mentioned earlier, a square is a rectangle; and no level of effort on its behalf required to maintain four equivalent dimensions affects that fact.

Architectural features

Transparency

Transparency is the attribute of a distributed operating system allowing it to appear as a unified, centralized, and local operating system. Many factors lend complexity to the concept of transparency in a distributed operating system (a system). Elements of a system are distributed spatially; a system’s software, its processes, and data are also distributed among these elements. Occasionally, elements need to communicate with other distant elements in the system. When a process asks a question of another process, it should not stand idly waiting for the answer; it should continue working productively. However, it should also remain alert for the answer; and receive it and process it immediately, to maintain the illusion of local elements. This added level of complexity is asynchronous communication. Communication time can become indefinite, when an element's connectivity is compromised, or an element itself fails. Connectivity and failure issues affect communication, but system processing is affected as well.

To remain transparent, a system's elements may copy (replicate) portions of themselves onto collections of host elements. In times of need, a failed element's information can be retrieved from these host elements to continue processing, and eventually reconstitute the faulty element. This too is added complexity, and it does not end here. This replication of information throughout the system requires coordination, and therefore a coordinator. The coordinator oversees many aspects of a system's operation, unless that coordinator fails. In this event, some other element must be chosen and constituted a coordinator. This process adds complexity to the system. The complexity in the system can quickly add up, and these examples by no means sum to a total. Transparency envelope a system in an abstraction of extremely complex construction; but provide a user with a complete, consistent, and simplified local interface to hardware, devices, and resources. The various facets of a system contributing to this complexity are discussed individually, below.

Modularity

A distributed operating system is inherently modular by definition. However, a system's modularity speaks more to its composition and configuration, the rationale behind these, and ultimately their effectiveness. A system element could be composed of multiple layers of components. Each of these components might vary in granularity of subcomponent. These layers and component compositions would each have a coherent and rational configuration towards some purpose in the system. The purpose could be for a more simplified abstraction, raw communication efficiency, accommodating heterogeneous elements, processing parallelism and concurrency, or possibly to support an object-oriented programming paradigm. In any event, the scattered distribution of system elements is not random, but is most often the result of detailed design and careful planning.

Persistence of Entity state

 existance not time-bound, regardless of breaks in system functions continuously
 resides in nonvolatile storage; synchronized with current, stable, active copy
 Subject to consistent and timely updates
 Able to survive hardware failure

Efficiency

 Many issues can adversly affect system performance:
 latency in interactions among distributed entities
 local response facade requires remote entities' state be cached locally
 and consistently synchronized to maintain the paradigm
 Workload variations, delays, interruptions, faults, and/or crashes of entities
 Distributed processing community assists when needed

Replication

 Duplication of state among selected distributed entities, and the synchronization of that state
 Remote communication required to effect synchronization

Reliability

 Inherent redundancy across the distributed entities provides fault-tolerance
 Consistent synchronized redundancy across N nodes, tolerates up to N-1 node faults

Flexibility

 OS has lattitude in degree of exposure to externals
 Externals have lattitude in degree of exposure they accept
 Coordination of process activity
 Where run; Near user?, resources?, avail. CPU?, etc...

Scalability

 node expansion
 process migration

History

Pioneering inspirations

With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.[1][2][3][4][5][6] While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the 20th century, many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments.

Aboriginal Distributed Computing

The DYSEAC[7] (1954)

One of the first solutions to these new questions was the DYSEAC, a self-described general-purpose synchronous computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the ACM, in April of 1954, a researcher at the National Bureau of Standards – now the National Institute of Standards and Technology (NIST) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, magnetic medium, and CRTs, and with the term “input-output operation” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system.

Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.

— ALAN L. LEINER, System Specifications for the DYSEAC

While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged.

Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.

— ALAN L. LEINER, System Specifications for the DYSEAC

This is one of the earliest examples of a computer with distributed control. Dept. of the Army reports[8] show it was certified reliable and passed all acceptance tests in April of 1954. It was completed and delivered on time, in May of 1954. In addition, was it mentioned that this was a portable computer? It was housed in tractor-trailer, and had 2 attendant vehicles and 6 tons of refrigeration capacity.

Multi-programming abstraction

The Lincoln TX-2[9] (1957)

Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique.

This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices.

Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control.

Memory access abstraction

Intercommunicating Cells, Basis for a Distributed Logic Computer[10] (1962)

One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of memory elements. A memory element was basically a electronic flip-flop or relay, capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores data in a string of symbols, consisting of a name and a set of associated parameters. Consequently, a system's information is linked through various associations of cells.

Intercommunicating Cells fundamentally break from tradition in that it has no counters or any concept of addressing memory. The theory contends that addressing is a wasteful and non-valuable level of indirection. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval projects through parameter sets and returns a set of names containing the given subset of parameters. This would be similar to a modified hash table data structure that would allow for multiple values (parameters) for each key (name).

Cellular memory would have many advantages:
  A major portion of a system's logic is distributed within the associations of information stored in the cells,
  This flow of information association is somewhat guided by the act of storing and retrieving,
  The time required for storage and retrieval is mostly constant and completely unrelated to the size and fill-factor of the memory
  Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size

This early research into alternative memory describes a configuration ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently atomic and exclusive. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the user, hardware/device, or Application programming interfaces is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state:

We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.

— Chung-Yeol (C. Y.) Lee, Intercommunicating Cells, Basis for a Distributed Logic Computer

Component abstraction

HYDRA:The Kernel of a Multiprocessor Operating System[11] (1974)
The design philosophy of HYDRA ... suggest that, at the heart of the system, one should build a collection of facilities of "universal applicability" and "absolute reliability" -- a set of mechanisms from which an arbitrary set of operating system facilities and policies can be conveniently, flexibly, efficiently, and reliably constructed.
Defining a kernel with all the attributes given above is difficult, and perhaps impractical... It is, nevertheless, the approach taken in the HYDRA system. Although we make no claim either that the set of facilities provided by the HYDRA kernel ... we do believe the set provides primitives which are both necessary and adequate for the construction of a large and interesting class of operating environments. It is our view that the set of functions provided by HYDRA will enable the user of C.mmp to create his own operating environment without being confined to predetermined command and file systems, execution scenarios, resource allocation policies, etc.

Initial composition

The National Software Works: A Distributed Processing System[12] (1975)

The National Software Works (NSW) is a significant new step in the development of distributed processing systems and computer networks. NSW is an ambitious project to link a set of geographically distributed and diverse hosts with an operating system which appears as a single entity to a prospective user.

Complete instantiation

The Rosco Distributed Operating System[13] (1979)

Roscoe is an operating system implemented at the University of Wisconsin that allows a network of microcomputers to cooperate to provide a general-purpose computing facility. The goal of the Roscoe network is to provide a general-purpose computation resource in which individual resources such as files and processors are shared among processes and control is distributed in a non-hierarchical fashion. All processors are identical. Similarly, all processors run the same operating system kernel. However, they may differ in the peripheral units connected to them. No memory is shared between processors. All communication involves messages explicitly passed between physically connected processors. No assumptions are made about the topology of interconnection.

The decision not to use logical or physical sharing of memory for communication is influenced both by the constraints of currently available hardware and by our perception of cost bottlenecks likely to arise as the number of processors increases.

Foundational Work

Coherent memory abstraction

 Algorithms for scalable synchronization on shared-memory multiprocessors[14]
 A N algorithm for mutual exclusion in decentralized systems[15]

File System abstraction

 Measurements of a distributed file system[16]
 Memory coherence in shared virtual memory systems[17]

Transaction abstraction

 Transactions
 Sagas[18]

 Transactional Memory
 Composable memory transactions[19]
 Transactional memory: architectural support for lock-free data structures[20]
 Software transactional memory for dynamic-sized data structures[21]
 Software transactional memory[22]

Persistence abstraction

 OceanStore: an architecture for global-scale persistent storage[23]

Coordinator abstraction

 Weighted voting for replicated data[24]
 Consensus in the presence of partial synchrony[25]

Reliability abstraction

 Sanity checks
 The Byzantine Generals Problem[26]
 Fail-stop processors: an approach to designing fault-tolerant computing systems[27]

 Recoverability
 Distributed snapshots: determining global states of distributed systems[28]
 Optimistic recovery in distributed systems[29]

Current Research

replicated model extended to a component object model

 Architectural Design of E1 Distributed Operating System[30]
 The Cronus distributed operating system[31]
 Fine-grained mobility in the emerald system[32]
 Design and development of MINIX distributed operating system[33]

Future Directions

Systems able to provide low-level complexity exposure, in proportion to trust and accepted responsibility

 Application performance and flexibility on exokernel systems.[34]
 Scale and performance in the Denali isolation kernel.[35]

Infrastructures focused on multi-processor/core processing

 The multikernel: a new OS architecture for scalable multicore systems.[36]
 Corey: an Operating System for Many Cores.[37]

Systems extending a consistent and stable impression of distributed processing over extremes in heterogeneity

 Helios: heterogeneous multiprocessing with satellite kernels.[38]

Systems able to provide effective, stable, and beneficial views of vastly increased complexity on multiple levels

 Tesselation

References

  1. ^ Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6-8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133.
  2. ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57
  3. ^ Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.
  4. ^ Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.
  5. ^ Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.
  6. ^ Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3-5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.
  7. ^ Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.
  8. ^ Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961
  9. ^ Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26-28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26 - 28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.
  10. ^ Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4-6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).
  11. ^ Wulf, W., Cohen, E., Corwin, W., Jones, A., Levin, R., Pierson, C., and Pollack, F. 1974. HYDRA: the kernel of a multiprocessor operating system. Commun. ACM 17, 6 (Jun. 1974), 337-345.
  12. ^ Millstein, R. E. 1977. The National Software Works: A distributed processing system. In Proceedings of the 1977 Annual Conference ACM '77. ACM, New York, NY, 44-52.
  13. ^ Solomon, M. H. and Finkel, R. A. 1979. The Roscoe distributed operating system. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79.
  14. ^ Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.
  15. ^ Maekawa, M. 1985. A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May. 1985), 145-159.
  16. ^ Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13 - 16, 1991). SOSP '91. ACM, New York, NY, 198-212.
  17. ^ Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.
  18. ^ Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27 - 29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.
  19. ^ Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15 - 17, 2005). PPoPP '05. ACM, New York, NY, 48-60.
  20. ^ Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16 - 19, 1993). ISCA '93. ACM, New York, NY, 289-300.
  21. ^ Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM, New York, NY, 92-101.
  22. ^ Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottowa, Ontario, Canada, August 20 - 23, 1995). PODC '95. ACM, New York, NY, 204-213.
  23. ^ Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.
  24. ^ Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79. ACM, New York, NY, 150-162
  25. ^ Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.
  26. ^ Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.
  27. ^ Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.
  28. ^ Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.
  29. ^ Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3
  30. ^ L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.
  31. ^ Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.
  32. ^ Jul, E., Levy, H., Hutchinson, N., and Black, A. 1987. Fine-grained mobility in the emerald system. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (Austin, Texas, United States, November 08 - 11, 1987). SOSP '87. ACM, New York, NY, 105-106.
  33. ^ Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.
  34. ^ M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. In the Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malô, France, October 1997.
  35. ^ Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation
  36. ^ Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.
  37. ^ S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.
  38. ^ Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.
  • Coming Soon...