Distributed operating system: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 05:57, 30 March 2010 edit JLSjr (talk \| contribs) 106 edits No edit summary ← Previous edit		Latest revision as of 15:45, 27 April 2025 edit undo 84.9.238.175 (talk) →See also: fixed Tag: possibly inaccurate edit summary
(349 intermediate revisions by more than 100 users not shown)
Line 1: {{Short description\|Operating system designed to operate on multiple systems over a network computer}} ~~{{Userspace draft}}~~ A '''distributed operating system''' is system software over a collection of independent software, [[Computer network\|networked]], [[Inter-process communication\|communicating]], and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs.<ref name="Tanenbaum1993">{{cite journal \|last=Tanenbaum \|first=Andrew S \|date=September 1993 \|title=Distributed operating systems anno 1992. What have we learned so far? \|journal=Distributed Systems Engineering \|volume=1 \|issue=1 \|pages=3–10 \|doi=10.1088/0967-1846/1/1/001\|bibcode=1993DSE.....1....3T \|doi-access=free }}</ref> Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners.<ref name="Nutt1992">{{cite book\|last=Nutt\|first=Gary J.\|title=Centralized and Distributed Operating Systems\|url=https://archive.org/details/centralizeddistr0000nutt \|url-access=registration\|year=1992\|publisher=Prentice Hall\|isbn=978-0-13-122326-4}}</ref> The first is a ubiquitous minimal [[kernel (operating system)\|kernel]], or [[microkernel]], that directly controls that node's hardware. Second is a higher-level collection of ''system management components'' that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.<ref name="Gościński1991">{{cite book\|last=Gościński\|first=Andrzej\|title=Distributed Operating Systems: The Logical Design\|url=https://books.google.com/books?id=ZnYhAQAAIAAJ\|year=1991\|publisher=Addison-Wesley Pub. Co.\|isbn=978-0-201-41704-3}}</ref> ~~<br />~~ ~~{{Under construction\|notready=true}}~~ ~~<br />~~ ~~{{notice\|image=Gnome globe current event.svg\|~~ ~~<br />{{pad\|2em}} <big>'''''There are several improvements in process (daily) '''''</big>~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''There is basically an outline of structure in place currently'''~~ ~~<br />{{pad\|4em}} The "bullet-points" are intended to outline the potential discussion~~ ~~<br />{{pad\|4em}} There will be little, if any, bulleted-style information in the finished product~~ ~~<br />{{pad\|4em}} Comments are welcomed; especially if there are additional areas suggested~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''Some reference material supporting parts of the structure have been entered'''~~ ~~<br />{{pad\|4em}} The {{color\|red\|''snippets of text''}} under a few remaining references is a snippet of the reference itself, to display applicability~~ ~~<br />{{pad\|4em}} This type of direct copying of reference information '''''WILL NOT''''' be part of any section of this article~~ ~~<br />{{pad\|4em}} Again, this information is here to give an idea of the paper, without having to go and read it...~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''References are there to indicate from where the material for the given section will be derived'''~~ ~~<br />{{pad\|4em}} Many more reference documents will be added in the next 2-3 weeks (for every 6-10 read, maybe 1 makes it in; somewhere...)~~ ~~<br />{{pad\|4em}} Reference documents themselves will most likely '''''NOT''''' be directly included in any given section's text~~ ~~<br />{{pad\|4em}} The reference documents in any given section will heavily influence the content of that section~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''All comments, observations, hints, corrections, additions, etc... are very welcome'''~~ ~~<br />{{pad\|4em}} View the progress so far: [[User:JLSjr/Distributed operating system\|Article Under Construction: JLSjr/Distributed operating system]]~~ ~~<br />{{pad\|4em}} Join current discussion: [[User:JLSjr/Distributed operating system\|Lets get busy: JLSjr/Distributed operating system]]~~ ~~<br />~~ ~~<br />{{pad\|2em}} [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 12:48, 13 March 2010 (UTC)~~ ~~<br />~~ }} ~~<br />~~ ~~{{notice\|~~ ~~<br />{{pad\|2em}} '''''Revision/Update History'''''~~ ~~<br /><br />~~ }} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added draft of Memory access abstraction, third section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 07:30, 25 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added drafts of Transparency and Modularity, under "Architectural features"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 06:09, 24 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of the second section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 05:44, 23 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of the first section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 05:32, 18 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of "Overview" section...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 06:45, 17 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of "Lead" section...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 09:02, 16 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added "Introduction" outline; framework of text to come...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 04:56, 15 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:0.5px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|50px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added initial entry...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 12:48, 13 March 2010 (UTC)~~ \|} The microkernel and the management components collection work together. They support the system's goal of integrating multiple resources and processing functionality into an efficient and stable system.<ref name="Fortier1986">{{cite book\|last=Fortier\|first=Paul J.\|title=Design of Distributed Operating Systems: Concepts and Technology\|url=https://books.google.com/books?id=F7QmAAAAMAAJ\|year=1986\|publisher=Intertext Publications\|isbn=9780070216211}}</ref> This seamless integration of individual nodes into a global system is referred to as ''transparency'', or ''[[single system image]]''; describing the illusion provided to users of the global system's appearance as a single computational entity.<!-- is transparency required for membership in the "dos" group?--> ~~<hr style="width: 80%; height: 2px;">~~ {{TOC limit\|3}} ==Description== A '''Distributed operating system''' is the minimal subset of software within a distributed system, which -- considered collectively -- provide all operating system services required to support higher-level components in empowering and maintaining the system. [[File:OS-structure2.svg\|thumb\|right\|400px\|Structure of monolithic kernel, microkernel and hybrid kernel-based operating systems]] A distributed OS provides the essential services and functionality required of an OS but adds attributes and particular [[Computer configuration\|configurations]] to allow it to support additional requirements such as increased scale and availability. To a user, a distributed OS works in a manner similar to a single-node, [[Monolithic kernel\|monolithic operating system]]. That is, although it consists of multiple nodes, it appears to users and applications as a single-node. Separating minimal system-level functionality from additional user-level modular services provides a "[[separation of mechanism and policy]]". Mechanism and policy can be simply interpreted as "what something is done" versus "how something is done," respectively. This separation increases flexibility and scalability.<!-- is this separation part of the definition or a desirable feature? isn't the reason for separating the kernel from services about hardware independent services rather than anything to do with scalability and flexibility? e.g., monolithic software can still support distributed requirements. and it's not clear what policy has to do with this topic. --> ~~== Description ==~~ A Distributed operating system is an [[operating system]]. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the [[distributed system]]. This idea is synonymous to the consideration of a [[Square (geometry)#Properties\|square]]. A square might not immediately be recognized as a rectangle. Although possessing all requisite [[Attribute (computing)\|attributes]] defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular [[Computer configuration\|configuration]] make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, [[Monolithic kernel\|monolithic operating system]]. That is, although distributed in nature, it supports the system’s appearance as a singular, local entity. ~~<br />~~ ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ An operating system, at a basic level, is expected to isolate and manage the physical complexities of lower-level [[hardware]] resources. In turn, these complexities are organized into simplified logical [[abstractions]] and presented to higher-level entities as [[Interface (computer science)\|interfaces]] into the underlying [[Resource (computer science)\|resources]]. These marshalling and presentation activities take place in a secure and protected environment, often referred to as the “[[Supervisor mode#Supervisor mode\|system-level]],” and describe a minimal scope of practical operating system functionality. In graphical depictions however, most monolithic operating systems would be illustrated as a discrete container sandwiched between the local hardware resources below and application programs above. The operating system container would be filled with a robust compliment of services and functions to support as many potential needs as possible or practical. This full-featured collection of services would reside and execute at the system-level and support higher, “[[User space\|user-level]]” applications and services. ==Overview== A distributed operating system, illustrated in a similar fashion, would be a container suggesting [[Microkernel#Essential components and minimality\|minimal operating system functionality and scope]]. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “[[separation of mechanism and policy]].” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed system. ===The ~~Overview~~ kernel=== At each [[Locale (computer hardware)\|locale]] (typically a node), the kernel provides a minimally complete set of node-level utilities necessary for operating a node's underlying hardware and resources. These mechanisms include allocation, management, and disposition of a node's resources, processes, communication, and [[input/output]] management support functions.<ref name="Hansen2001">{{cite book\|editor=Hansen, Per Brinch\|title=Classic Operating Systems: From Batch Processing to Distributed Systems\|url=https://books.google.com/books?id=-PDPBvIPYBkC\|year=2001\|publisher=Springer\|isbn=978-0-387-95113-3}}</ref> Within the kernel, the communications sub-system is of foremost importance for a distributed OS.<ref name="Gościński1991"/> ~~An approach to describing the unique nature of DOS~~ In a distributed OS, the kernel often supports a minimal set of functions, including low-level [[address space]] management, [[thread (computing)\|thread]] management, and [[inter-process communication]] (IPC). A kernel of this design is referred to as a [[microkernel]].<ref>Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.</ref><ref>COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.</ref> Its modular nature enhances reliability and security, essential features for a distributed OS.<ref name="Sinha1997">{{cite book\|last=Sinha\|first=Pradeep Kumar \|title=Distributed Operating Systems: Concepts and Design\|url=https://archive.org/details/distributedopera0000sinh\|url-access=registration\|year=1997\|publisher=IEEE Press\|isbn=978-0-7803-1119-0}}</ref> The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is a distributed system, not simply decentralized. [[Image:System Management Components.PNG\|thumbnail\|right\|175px\|alt=General overview of system management components that reside above the microkernel.\|System management components overview]] This distinction is the source of the subtlety and complexity. While decentralized systems and distributed systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of [[Inter-process communication\|communication]] between the nodes of the system. ===System management=== To better illustrate this point, let us more closely reflect upon these three system [[Software architecture\|architectures]]; centralized, decentralized, and distributed. In this examination, we will consider three tightly-related aspects of their structure: organization, connection, and control. Organization will describe physical arrangement characteristics, connection will involve associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier considerations. System management components are software processes that define the node's ''policies''. These components are the part of the OS outside the kernel. These components provide higher-level communication, process and resource management, reliability, performance and security. The components match the functions of a single-entity system, adding the transparency required in a distributed environment.<ref name="Gościński1991"/> ~~<br />~~ ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=<br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />~~ ~~\|info=Multiple Diagrams<br /><br />will be furnished to assist<br /><br />in illustration of these ideas.~~ ~~\|float = right}}~~ Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more [[Federation\|federated structure]], multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements. The distributed nature of the OS requires additional services to support a node's responsibilities to the global system. In addition, the system management components accept the "defensive" responsibilities of reliability, availability, and persistence. These responsibilities can conflict with each other. A consistent approach, balanced perspective, and a deep understanding of the overall system can assist in identifying [[diminishing returns]].<!--this sentence is rhetoric. say what is meant. give an example.--> Separation of policy and mechanism mitigates such conflicts.<ref name="Chow1997">{{cite book\|last1=Chow\|first1=Randy\|author2=Theodore Johnson\|title=Distributed Operating Systems and Algorithms\|url=https://books.google.com/books?id=J4MZAQAAIAAJ\|year=1997\|publisher=Addison Wesley\|isbn=978-0-201-49838-7}}</ref> Association linkages between elements will be the second consideration. In each case, physical association is inextricably linked (or not), to conceptual organization. The centralized system has its constituent members directly united to a central entity. One could conceptualize holding a bunch of [[balloon\|balloons]] -- each on a string, -- with the hand being the central figure. A decentralized system incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a [[Org_chart#Example_of_an_organizational_chart\|corporate organizational chart]], the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Think of the 1970’s phenomena of “[[string art]],” a [[spirograph]] drawing, a [[spider web\|spider’s web]], or the [[Interstate Highway System]] between U.S. cities. ===Working together as an operating system=== Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article. The architecture and design of a distributed operating system must realize both individual node and global system goals. Architecture and design must be approached in a manner consistent with separating policy and mechanism. In doing so, a distributed operating system attempts to provide an efficient and reliable distributed computing framework allowing for an absolute minimal user awareness of the underlying command and control efforts.<ref name="Sinha1997" /> ~~Lastly,~~The asmulti-level tocollaboration ~~the~~between ~~nature~~a ofkernel and the ~~distributed~~ system management components, ~~some~~and ~~experts~~in ~~state~~turn ~~that~~between the ~~distributed~~distinct ~~operating~~nodes ~~system~~in isa ~~not an~~distributed operating system atis ~~all;~~the ~~but~~functional ~~just~~challenge aof the distributed operating system,. ~~because~~This ofis the ~~attention required~~point in ~~maintenance of~~ the system. that ~~This~~must ~~author,~~maintain ~~and~~a byperfect ~~extension~~harmony ~~this~~of ~~article~~purpose, ~~will~~and simultaneously maintain ~~the~~a ~~operating system~~complete ~~status~~disconnect of intent from implementation. This challenge is the distributed operating system,'s byopportunity ~~both~~to ~~observation~~produce ~~and~~the ~~vacuous~~foundation ~~proof.~~and framework Asfor ~~mentioned~~a ~~earlier~~reliable, aefficient, ~~[[Square_(geometry)#Other_facts\|square]]~~available, isrobust, ~~a [[rectangle]];~~extensible, and noscalable ~~level~~system. ofHowever, ~~effort~~this onopportunity ~~its~~comes ~~behalf~~at ~~required~~a tovery ~~maintain~~high ~~four~~cost ~~equivalent dimensions affects that~~in ~~fact~~complexity. ===The price of complexity=== ~~== Architectural features ==~~ In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.<ref>Surajbali, B., Coulson, G., Greenwood, P., and Grace, P. 2007. Augmenting reflective middleware with an aspect orientation support layer. In Proceedings of the 6th international Workshop on Adaptive and Reflective Middleware: Held At the ACM/IFIP/USENIX international Middleware Conference (Newport Beach, CA, November 26–30, 2007). ARM '07. ACM, New York, NY, 1-6.</ref> These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system's overall architectural and design detail is required at an exceptionally early point.<ref name="Tanenbaum1993"/> An exhausting array of design considerations are inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely on documented experience and research in distributed computing power. ~~=== Transparency ===~~ ==History== Transparency is the attribute of a distributed operating system allowing it to appear as a unified, centralized, and local operating system. Many factors lend complexity to the concept of transparency in a distributed operating system (a system). Elements of a system are distributed spatially; a system’s software, its processes, and data are also distributed among these elements. Occasionally, elements need to communicate with other distant elements in the system. When a process asks a question of another process, it should not stand idly waiting for the answer; it should continue working productively. However, it should also remain alert for the answer; and receive it and process it immediately, to maintain the illusion of local elements. This added level of complexity is asynchronous communication. Communication time can become indefinite, when an element's connectivity is compromised, or an element itself fails. Connectivity and failure issues affect communication, but system processing is affected as well. Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success. Fundamental and pioneering implementations of primitive distributed operating system component concepts date to the early 1950s.<ref name=dyseac>{{cite journal \|last1=Leiner \|first1=Alan L. \|title=System Specifications for the DYSEAC \|journal=Journal of the ACM \|date=April 1954 \|volume=1 \|issue=2 \|pages=57–81 \|doi=10.1145/320772.320773 \|s2cid=15381094 \|doi-access= }}</ref><ref name=lincoln_tx2>{{cite conference \|title=The Lincoln TX-2 Input-Output System \|first=James W. \|last=Forgie \|date=February 26–28, 1957 \|conference=Western Joint Computer Conference: Techniques for Reliability \|publisher=Association for Computing Machinery \|___location=Los Angeles, California \|pages=156–160 \|isbn=9781450378611 \|doi=10.1145/1455567.1455594 \|doi-access=free }}</ref><ref name=intercomm_cells>{{cite conference \|author=C. Y. Lee \|title=Intercommunicating cells, basis for a distributed logic computer \|date=December 4–6, 1962 \|conference=Fall Joint Computer Conference \|publisher=Association for Computing Machinery \|___location=Philadelphia, Pennsylvania \|pages=130–136 \|doi=10.1145/1461518.1461531 \|doi-access=free}}</ref> Some of these individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing.<ref name="Dreyfus_1958_Gamma60">{{citation \|title=System design of the Gamma 60 \|author-first=Phillippe \|author-last=Dreyfus \|author-link=Philippe Dreyfus \|work=Proceedings of the May 6–8, 1958, [[Western Joint Computer Conference]]: Contrasts in Computers \|___location=Los Angeles \|date=1958-05-08 \|orig-year=1958-05-06 \|id=IRE-ACM-AIEE '58 (Western) \|publication-place=ACM, New York, NY, USA \|pages=130–133 \|url=https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf \|access-date=2017-04-03 \|url-status=live \|archive-url=https://web.archive.org/web/20170403224547/https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf \|archive-date=2017-04-03}}</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9–13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09–13, 1957). IRE-ACM-AIEE '57</ref><ref>Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.</ref><ref>Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.</ref><ref>Estrin, G. 1960. [https://dl.acm.org/doi/abs/10.1145/1460361.1460365 Organization of computer systems: the fixed plus variable structure computer]. In Papers Presented At the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03–05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.</ref> To remain transparent, a system's elements may copy (replicate) portions of themselves onto collections of host elements. In times of need, a failed element's information can be retrieved from these host elements to continue processing, and eventually reconstitute the faulty element. This too is added complexity, and it does not end here. This replication of information throughout the system requires coordination, and therefore a coordinator. The coordinator oversees many aspects of a system's operation, unless that coordinator fails. In this event, some other element must be chosen and constituted a coordinator. This process adds complexity to the system. The complexity in the system can quickly add up, and these examples by no means sum to a total. Transparency envelope a system in an abstraction of extremely complex construction; but provide a user with a complete, consistent, and simplified local interface to hardware, devices, and resources. The various facets of a system contributing to this complexity are discussed individually, below. In the mid-1970s, research produced important advances in distributed computing. These breakthroughs provided a solid, stable foundation for efforts that continued through the 1990s. ~~=== Modularity ===~~ The accelerating proliferation of [[Multiprocessing\|multi-processor]] and [[multi-core processor]] systems research led to a resurgence of the distributed OS concept. A distributed operating system is inherently modular by definition. However, a system's '''modularity''' speaks more to its composition and configuration, the rationale behind these, and ultimately their effectiveness. A system element could be composed of multiple layers of components. Each of these components might vary in granularity of subcomponent. These layers and component compositions would each have a coherent and rational configuration towards some purpose in the system. The purpose could be for a more simplified abstraction, raw communication efficiency, accommodating heterogeneous elements, processing parallelism and concurrency, or possibly to support an object-oriented programming paradigm. In any event, the scattered distribution of system elements is not random, but is most often the result of detailed design and careful planning. ===The DYSEAC=== ~~=== Persistence of Entity state ===~~ One of the first efforts was the [[DYSEAC]], a general-purpose [[Synchronization (computer science)\|synchronous]] computer. In one of the earliest publications of the [[Association for Computing Machinery]], in April 1954, a researcher at the [[National Bureau of Standards]]{{snd}} now the National [[nist\|Institute of Standards and Technology]] ([[nist\|NIST]]){{snd}} presented a detailed specification of the DYSEAC. The introduction focused upon the requirements of the intended applications, including flexible communications, but also mentioned other computers: ~~{{pad\|2em}}existance not time-bound, regardless of breaks in system functions continuously~~ ~~<br />{{pad\|2em}}resides in nonvolatile storage; synchronized with current, stable, active copy~~ ~~<br />{{pad\|2em}}Subject to consistent and timely updates~~ ~~<br />{{pad\|2em}}Able to survive hardware failure~~ {{blockquote\|Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} ~~=== Efficiency ===~~ ~~{{pad\|2em}}Many issues can adversly affect system performance:~~ ~~<br />{{pad\|2em}}latency in interactions among distributed entities~~ ~~<br />{{pad\|4em}}local response facade requires remote entities' state be cached locally~~ ~~<br />{{pad\|4em}}and consistently synchronized to maintain the paradigm~~ ~~<br />{{pad\|2em}}Workload variations, delays, interruptions, faults, and/or crashes of entities~~ ~~<br />{{pad\|4em}}Distributed processing community assists when needed~~ The specification discussed the architecture of multi-computer systems, preferring peer-to-peer rather than master-slave. ~~=== Replication ===~~ {{blockquote\|Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} ~~{{pad\|2em}}Duplication of state among selected distributed entities, and the synchronization of that state~~ ~~<br />{{pad\|2em}}Remote communication required to effect synchronization~~ This is one of the earliest examples of a computer with distributed control. The [[United States Department of the Army\|Dept. of the Army]] reports<ref>Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961</ref> certified it reliable and that it passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. This was a "[[portable computer]]", housed in a [[Tractor-trailer#Types of trailers\|tractor-trailer]], with 2 attendant vehicles and [[Refrigerator truck\|6 tons of refrigeration]] capacity. ~~=== Reliability ===~~ ~~{{pad\|2em}}Inherent redundancy across the distributed entities provides fault-tolerance~~ ~~<br />{{pad\|2em}}Consistent synchronized redundancy across N nodes, tolerates up to N-1 node faults~~ ===Lincoln ~~Flexibility~~ TX-2=== ~~{{pad\|2em}}OS has lattitude in degree of exposure to externals~~ ~~<br />{{pad\|2em}}Externals have lattitude in degree of exposure they accept~~ ~~<br />{{pad\|4em}}Coordination of process activity~~ ~~<br />{{pad\|4em}}Where run; Near user?, resources?, avail. CPU?, etc...~~ Described as an experimental input-output system, the [[Lincoln TX-2]] emphasized flexible, simultaneously operational input-output devices, i.e., [[multiprogramming]]. The design of the TX-2 was modular, supporting a high degree of modification and expansion.<ref name=lincoln_tx2/> ~~=== Scalability ===~~ ~~{{pad\|2em}}node expansion~~ ~~<br />{{pad\|2em}}process migration~~ The system employed The Multiple-Sequence Program Technique. This technique allowed multiple [[program counter]]s to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion related to device sequencing. ~~== History ==~~ Similar to DYSEAC the TX-2 separately programmed devices can operate simultaneously, increasing [[throughput]]. The full power of the central unit was available to any device. The TX-2 was another example of a system exhibiting distributed control, its central unit not having dedicated control.<!-- seems questionable unless the devices were explicitly other computers--> ~~=== Pioneering inspirations ===~~ ===Intercommunicating Cells=== With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.<ref>Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6-8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133. </ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57</ref><ref>Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.</ref><ref>Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335. </ref><ref>Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3-5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.</ref> While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the [[20th century]], many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments. One early effort at abstracting memory access was Intercommunicating Cells, where a cell was composed of a collection of [[Computer data storage\|memory]] elements. A memory element was basically a binary electronic [[flip-flop (electronics)\|flip-flop]] or [[relay]]. Within a cell there were two types of elements, ''symbol'' and ''cell''. Each cell structure stores [[data]] in a [[String (computer science)\|string]] of symbols, consisting of a [[Identifier\|name]] and a set of [[parameter]]s. Information is linked through cell associations.<ref name=intercomm_cells/> The theory contended that addressing is a wasteful and non-valuable [[indirection\|level of indirection]]. Information was accessed in two ways, direct and cross-retrieval. Direct retrieval accepts a name and returns a parameter set. Cross-retrieval [[Projection (mathematics)\|projects]] through parameter sets and returns a set of names containing the given [[subset]] of parameters. This was similar to a modified [[hash table]] [[data structure]] that allowed multiple [[Value (mathematics)\|values]] (parameters) for each [[Unique key\|key]] (name). ~~==== Aboriginal Distributed Computing ====~~ ~~'''The DYSEAC'''<ref>Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.</ref> (1954)~~ {\| style="width=100%;" One of the first solutions to these new questions was the [[DYSEAC]], a self-described general-purpose [[Synchronization (computer science)\|synchronous]] computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the [[ACM]], in April of 1954, a researcher at the [[National Bureau of Standards]] – now the National [[nist\|Institute of Standards and Technology]] ([[nist\|NIST]]) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, [[Magnetic storage\|magnetic medium]], and [[Cathode ray tube\|CRTs]], and with the term “[[Input/output\|input-output operation]]” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system. \|- valign="top" \| colspan="3" \|Cellular memory would have many advantages: \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| A major portion of a system's [[Boolean logic\|logic]] is distributed within the associations of information stored in the cells, \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| This flow of information association is somewhat guided by the act of storing and retrieving, \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| The time required for storage and [[Information retrieval\|retrieval]] is mostly [[constant time\|constant]] and completely unrelated to the size and fill-factor of the memory \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size \|} This [[Computer configuration\|configuration]] was ideal for distributed systems. The constant-time projection through memory for storing and retrieval was inherently [[Atomic operation\|atomic]] and [[Mutual exclusion\|exclusive]]. The cellular memory's intrinsic distributed characteristics<!-- are these intrinsically distributed or merely abstract?--> would be invaluable. The impact on the [[User interface\|user]], [[Computer hardware\|hardware]]/[[Peripheral\|device]], or [[Application programming interface]]s was indirect. The authors were considering distributed systems, stating: {{quote\|Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} {{blockquote\|We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.\|Chung-Yeol (C. Y.) Lee\|''Intercommunicating Cells, Basis for a Distributed Logic Computer''}} While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged. ===Foundational work=== {{quote\|Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} ====Coherent memory abstraction==== This is one of the earliest examples of a computer with distributed control. [[United States Department of the Army\|Dept. of the Army]] reports<ref>Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961</ref> show it was certified reliable and passed all acceptance tests in April of 1954. It was completed and delivered on time, in May of 1954. In addition, was it mentioned that this was a [[portable computer]]? It was housed in [[Tractor-trailer#Types_of_trailers\|tractor-trailer]], and had 2 attendant vehicles and [[Refrigerator truck\|6 tons of refrigeration]] capacity. {{pad\|2em}} Algorithms for scalable synchronization on shared-memory multiprocessors <ref>Mellor-Crummey, J. M. and Scott, M. L. 1991. [https://dl.acm.org/doi/abs/10.1145/103727.103729 Algorithms for scalable synchronization on shared-memory multiprocessors]. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.</ref> ====File ~~Multi-programming~~System abstraction ==== {{pad\|2em}}Measurements of a distributed file system<ref>Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. [http://people.csail.mit.edu/ledlie/resources/papers/1991/baker.ps Measurements of a distributed file system]. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13–16, 1991). SOSP '91. ACM, New York, NY, 198-212.</ref> '''The Lincoln TX-2'''<ref>Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26-28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26 - 28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.</ref> (1957) <br />{{pad\|2em}}Memory coherence in shared virtual memory systems <ref>Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.</ref> ====Transaction abstraction==== Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique. {{pad\|2em}}''Transactions'' <br />{{pad\|4em}} Sagas <ref>Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27–29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.</ref> {{pad\|2em}}''Transactional Memory'' This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices. <br />{{pad\|4em}}Composable memory transactions<ref>Harris, T., Marlow, S., [[Simon Peyton Jones\|Peyton-Jones, S.]], and Herlihy, M. 2005. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.3476&rep=rep1&type=pdf Composable memory transactions]. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15–17, 2005). PPoPP '05. ACM, New York, NY, 48-60.</ref> <br />{{pad\|4em}}Transactional memory: architectural support for lock-free data structures <ref>Herlihy, M. and Moss, J. E. 1993. [http://hpl.americas.hp.net/techreports/Compaq-DEC/CRL-92-7.pdf Transactional memory: architectural support for lock-free data structures]. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16–19, 1993). ISCA '93. ACM, New York, NY, 289-300.</ref> <br />{{pad\|4em}}Software transactional memory for dynamic-sized data structures<ref>Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.8787&rep=rep1&type=pdf Software transactional memory for dynamic-sized data structures]. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13–16, 2003). PODC '03. ACM, New York, NY, 92-101.</ref> <br />{{pad\|4em}}Software transactional memory<ref>Shavit, N. and Touitou, D. 1995. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.5928&rep=rep1&type=pdf Software transactional memory]. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottawa, Ontario, Canada, August 20–23, 1995). PODC '95. ACM, New York, NY, 204-213.</ref> ====Persistence abstraction==== Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control. {{pad\|2em}}OceanStore: an architecture for global-scale persistent storage <ref>Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.439.4822&rep=rep1&type=pdf OceanStore: an architecture for global-scale persistent storage]. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.</ref> ==== ~~Memory access~~Coordinator abstraction ==== {{pad\|2em}} Weighted voting for replicated data <ref>Gifford, D. K. 1979. [http://pages.cs.wisc.edu/~remzi/Classes/739/Spring2004/Papers/p150-gifford.pdf Weighted voting for replicated data]. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10–12, 1979). SOSP '79. ACM, New York, NY, 150-162</ref> '''Intercommunicating Cells, Basis for a Distributed Logic Computer'''<ref>Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4-6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).</ref> (1962) <br />{{pad\|2em}} Consensus in the presence of partial synchrony <ref>Dwork, C., Lynch, N., and Stockmeyer, L. 1988. [https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-TM-270.pdf Consensus in the presence of partial synchrony]. J. ACM 35, 2 (Apr. 1988), 288-323.</ref> ====Reliability abstraction==== One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of [[Computer data storage\|memory]] elements. A memory element was basically a electronic [[flip-flop]] or [[relay]], capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores [[data]] in a [[String (computer science)\|string]] of symbols, consisting of a [[Identifier\|name]] and a set of associated [[parameter]]s. Consequently, a system's information is linked through various associations of cells. {{pad\|2em}}''Sanity checks'' <br />{{pad\|4em}}The Byzantine Generals Problem <ref>Lamport, L., Shostak, R., and Pease, M. 1982. [http://people.cs.uchicago.edu/~shanlu/teaching/33100_wi15/papers/byz.pdf The Byzantine Generals Problem]. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.</ref> <br />{{pad\|4em}}Fail-stop processors: an approach to designing fault-tolerant computing systems <ref>Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.</ref> {{pad\|2em}}''Recoverability'' Intercommunicating Cells fundamentally break from tradition in that it has no [[Program counter\|counter]]s or any concept of [[Memory address\|addressing memory]]. The theory contends that addressing is a wasteful and non-valuable [[indirection\|level of indirection]]. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval [[Projection (mathematics)\|projects]] through parameter sets and returns a set of names containing the given [[subset]] of parameters. This would be similar to a modified [[hash table]] [[data structure]] that would allow for multiple [[Value (mathematics)\|values]] (parameters) for each [[Unique key\|key]] (name). <br />{{pad\|4em}}''Distributed'' snapshots: determining global states of distributed systems<ref>Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.</ref> <br />{{pad\|4em}}Optimistic recovery in distributed systems <ref>Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3</ref> ==Distributed computing models== ~~{\| style="width=100%; background:#F8FFFB;"~~ {{More citations needed section\|date=January 2012}} ~~\|- valign="top"~~ ~~\| colspan="3" \|Cellular memory would have many advantages:~~ ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| A major portion of a system's [[Boolean logic\|logic]] is distributed within the associations of information stored in the cells,~~ ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| This flow of information association is somewhat guided by the act of storing and retrieving,~~ ~~\|- valign="top"~~ \| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| The time required for storage and [[Information retrieval\|retrieval]] is mostly [[constant time\|constant]] and completely unrelated to the size and fill-factor of the memory ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size~~ \|} ===Three basic distributions=== This early research into alternative memory describes a [[Computer configuration\|configuration]] ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently [[Atomic operation\|atomic]] and [[Mutual exclusion\|exclusive]]. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the [[User interface\|user]], [[hardware]]/[[Peripheral\|device]], or [[Application programming interface]]s is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state: To better illustrate this point, examine three system [[Software architecture\|architectures]]; centralized, decentralized, and distributed. In this examination, consider three structural aspects: organization, connection, and control. Organization describes a system's physical arrangement characteristics. Connection covers the communication pathways among nodes. Control manages the operation of the earlier two considerations. ====Organization==== {{quote\|We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.\|Chung-Yeol (C. Y.) Lee\|''Intercommunicating Cells, Basis for a Distributed Logic Computer''}} A [[Centralized computing\|centralized system]] has one level of structure, where all constituent elements directly depend upon a single control element. A [[decentralized system]] is hierarchical. The bottom level unites subsets of a system's entities. These entity subsets in turn combine at higher levels, ultimately culminating at a central master element. A distributed system is a collection of autonomous elements with no concept of levels. ==== ~~Component abstraction~~ Connection==== Centralized systems connect constituents directly to a central master entity in a hub and spoke fashion. A decentralized system (aka [[Network operating system\|network system]]) incorporates direct and indirect paths between constituent elements and the central entity. Typically this is configured as a hierarchy with only one shortest path between any two elements. Finally, the distributed operating system requires no pattern; direct and indirect connections are possible between any two elements. Consider the 1970s phenomena of “[[string art]]” or a [[spirograph]] drawing as a [[Fully connected network\|fully connected system]], and the [[spider web\|spider's web]] or the [[Interstate Highway System]] between U.S. cities as examples of a ''partially connected system''. '''HYDRA:The Kernel of a Multiprocessor Operating System'''<ref>Wulf, W., Cohen, E., Corwin, W., Jones, A., Levin, R., Pierson, C., and Pollack, F. 1974. HYDRA: the kernel of a multiprocessor operating system. Commun. ACM 17, 6 (Jun. 1974), 337-345.</ref> (1974) ~~<br />~~ <font color="red">''The design philosophy of HYDRA ... suggest that, at the heart of the system, one should build a collection of facilities of "universal applicability" and "absolute reliability" -- a set of mechanisms from which an arbitrary set of operating system facilities and policies can be conveniently, flexibly, efficiently, and reliably constructed.'' ~~<br />~~ ''Defining a kernel with all the attributes given above is difficult, and perhaps impractical... It is, nevertheless, the approach taken in the HYDRA system. Although we make no claim either that the set of facilities provided by the HYDRA kernel ... we do believe the set provides primitives which are both necessary and adequate for the construction of a large and interesting class of operating environments. It is our view that the set of functions provided by HYDRA will enable the user of C.mmp to create his own operating environment without being confined to predetermined command and file systems, execution scenarios, resource allocation policies, etc.''</font> ==== ~~Initial composition~~ Control==== Centralized and decentralized systems have directed [[Software flow control\|flows of connection]] to and from the central entity, while distributed systems communicate along arbitrary paths. This is the pivotal notion of the third consideration. Control involves allocating tasks and data to system elements balancing efficiency, responsiveness, and complexity. '''The National Software Works: A Distributed Processing System'''<ref>Millstein, R. E. 1977. The National Software Works: A distributed processing system. In Proceedings of the 1977 Annual Conference ACM '77. ACM, New York, NY, 44-52.</ref> (1975) Centralized and decentralized systems offer more control, potentially easing administration by limiting options. Distributed systems are more difficult to explicitly control, but scale better horizontally and offer fewer points of system-wide failure. The associations conform to the needs imposed by its design but not by organizational chaos <font color="red">''The National Software Works (NSW) is a significant new step in the development of distributed processing systems and computer networks. NSW is an ambitious project to link a set of geographically distributed and diverse hosts with an operating system which appears as a single entity to a prospective user.''</font> ==Design considerations== ~~==== Complete instantiation ====~~ '''The Rosco Distributed Operating System'''<ref>Solomon, M. H. and Finkel, R. A. 1979. The Roscoe distributed operating system. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79.</ref> (1979) ===Transparency=== <font color="red">''Roscoe is an operating system implemented at the University of Wisconsin that allows a network of microcomputers to cooperate to provide a general-purpose computing facility. The goal of the Roscoe network is to provide a general-purpose computation resource in which individual resources such as files and processors are shared among processes and control is distributed in a non-hierarchical fashion. All processors are identical. Similarly, all processors run the same operating system kernel. However, they may differ in the peripheral units connected to them. No memory is shared between processors. All communication involves messages explicitly passed between physically connected processors. No assumptions are made about the topology of interconnection.'' ''Transparency'' or ''single-system image'' refers to the ability of an application to treat the system on which it operates without regard to whether it is distributed and without regard to hardware or other implementation details. Many areas of a system can benefit from transparency, including access, ___location, performance, naming, and migration. The consideration of transparency directly affects decision making in every aspect of design of a distributed operating system. Transparency can impose certain requirements and/or restrictions on other design considerations. Systems can optionally violate transparency to varying degrees to meet specific application requirements. For example, a distributed operating system may present a hard drive on one computer as "C:" and a drive on another computer as "G:". The user does not require any knowledge of device drivers or the drive's ___location; both devices work the same way, from the application's perspective. A less transparent interface might require the application to know which computer hosts the drive. Transparency domains: ''The decision not to use logical or physical sharing of memory for communication is influenced both by the constraints of currently available hardware and by our perception of cost bottlenecks likely to arise as the number of processors increases. ''</font> * ''Location transparency'' – Location transparency comprises two distinct aspects of transparency, naming transparency and user mobility. Naming transparency requires that nothing in the physical or logical references to any system entity should expose any indication of the entity's ___location, or its local or remote relationship to the user or application. User mobility requires the consistent referencing of system entities, regardless of the system ___location from which the reference originates.<ref name="Sinha1997" />{{rp\|20}} * ''Access transparency'' – Local and remote system entities must remain indistinguishable when viewed through the user interface. The distributed operating system maintains this perception through the exposure of a single access mechanism for a system entity, regardless of that entity being local or remote to the user. Transparency dictates that any differences in methods of accessing any particular system entity—either local or remote—must be both invisible to, and undetectable by the user.<ref name="Gościński1991"/>{{rp\|84}}<!--what is the difference between referencing and access?--> * ''Migration transparency'' – Resources and activities migrate from one element to another controlled solely by the system and without user/application knowledge or action.<ref name="Galli2000">{{cite book\|last=Galli\|first=Doreen L.\|title=Distributed Operating Systems: Concepts and Practice\|url=https://archive.org/details/distributedopera00gall \|url-access=registration\|year=2000\|publisher=Prentice Hall\|isbn=978-0-13-079843-5}}</ref>{{rp\|16}} * ''Replication transparency'' – The process or fact that a resource has been duplicated on another element occurs under system control and without user/application knowledge or intervention.<ref name="Galli2000" />{{rp\|16}} * ''Concurrency transparency'' – Users/applications are unaware of and unaffected by the presence/activities of other users.<ref name="Galli2000" />{{rp\|16}} * ''Failure transparency'' – The system is responsible for detection and remediation of system failures. No user knowledge/action is involved other than waiting for the system to resolve the problem.<ref name="Chow1997" />{{rp\|30}} * ''Performance Transparency'' – The system is responsible for the detection and remediation of local or global performance shortfalls. Note that system policies may prefer some users/user classes/tasks over others. No user knowledge or interaction. is involved.<ref name="Sinha1997" />{{rp\|23}} * ''Size/Scale transparency'' – The system is responsible for managing its geographic reach, number of nodes, level of node capability without any required user knowledge or interaction.<ref name="Sinha1997" />{{rp\|23}} * ''Revision transparency'' – The system is responsible for upgrades and revisions and changes to system infrastructure without user knowledge or action.<ref name="Chow1997" />{{rp\|30}} * ''Control transparency'' – The system is responsible for providing all system information, constants, properties, configuration settings, etc. in a consistent appearance, connotation, and denotation to all users and applications.<ref name="Gościński1991"/>{{rp\|84}} * ''Data transparency'' – The system is responsible for providing data to applications without user knowledge or action relating to where the system stores it.<ref name="Gościński1991"/>{{rp\|85}} * ''Parallelism transparency'' – The system is responsible for exploiting any ability to parallelize task execution without user knowledge or interaction. Arguably the most difficult aspect of transparency, and described by Tanenbaum as the "Holy grail" for distributed system designers.<ref name="Tanenbaum1995">{{cite book\|last=Tanenbaum\|first=Andrew S.\|title=Distributed Operating Systems\|url=https://archive.org/details/unset0000unse_h1q3\|url-access=registration\|year=1995\|publisher=Prentice Hall\|isbn=978-0-13-219908-7}}</ref>{{rp\|23–25}} ===Inter-process communication=== ~~=== Foundational Work ===~~ [[Inter-Process Communication]] (IPC) is the implementation of general communication, process interaction, and [[dataflow]] between [[Thread (computer science)\|threads]] and/or [[Process (computing)\|processes]] both within a node, and between nodes in a distributed OS. The intra-node and inter-node communication requirements drive low-level IPC design, which is the typical approach to implementing communication functions that support transparency. In this sense, Interprocess communication is the greatest underlying concept in the low-level design considerations of a distributed operating system. ===Process management=== ~~==== Coherent memory abstraction ====~~ [[Process management (computing)\|Process management]] provides policies and mechanisms for effective and efficient sharing of resources between distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports to processors, as well as mechanisms to run, suspend, migrate, halt, or resume process execution. While these resources and operations can be either local or remote with respect to each other, the distributed OS maintains state and synchronization over all processes in the system. {{pad\|2em}}'''Algorithms for scalable synchronization on shared-memory multiprocessors'''<ref>Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.</ref> <br />{{pad\|2em}}'''A N algorithm for mutual exclusion in decentralized systems'''<ref>Maekawa, M. 1985. A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May. 1985), 145-159.</ref> As an example, [[Load balancing (computing)\|load balancing]] is a common process management function. Load balancing monitors node performance and is responsible for shifting activity across nodes when the system is out of balance. One load balancing function is picking a process to move. The kernel may employ several selection mechanisms, including priority-based choice. This mechanism chooses a process based on a policy such as 'newest request'. The system implements the policy ~~==== File System abstraction ====~~ {{pad\|2em}}'''Measurements of a distributed file system'''<ref>Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13 - 16, 1991). SOSP '91. ACM, New York, NY, 198-212.</ref> <br />{{pad\|2em}}'''Memory coherence in shared virtual memory systems'''<ref>Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.</ref> ===Resource management=== ~~==== Transaction abstraction ====~~ [[Resource (computer science)\|Systems resources]] such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. ''Load sharing'' and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many [[algorithm]]s exist to aid in these decisions; however, this calls for a second level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.<!--how is this different from process management?--> ~~{{pad\|2em}}''Transactions''~~ <br />{{pad\|4em}}'''Sagas'''<ref>Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27 - 29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.</ref> ===Reliability=== ~~{{pad\|2em}}''Transactional Memory''~~ Distributed OS can provide the necessary resources and services to achieve high levels of ''reliability'', or the ability to prevent and/or recover from errors. [[Fault (technology)\|Faults]] are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults. <br />{{pad\|4em}}'''Composable memory transactions'''<ref>Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15 - 17, 2005). PPoPP '05. ACM, New York, NY, 48-60.</ref> <br />{{pad\|4em}}'''Transactional memory: architectural support for lock-free data structures'''<ref>Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16 - 19, 1993). ISCA '93. ACM, New York, NY, 289-300.</ref> <br />{{pad\|4em}}'''Software transactional memory for dynamic-sized data structures'''<ref>Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM, New York, NY, 92-101.</ref> <br />{{pad\|4em}}'''Software transactional memory'''<ref>Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottowa, Ontario, Canada, August 20 - 23, 1995). PODC '95. ACM, New York, NY, 204-213.</ref> The primary methods for dealing with faults include ''fault avoidance'', [[Fault-tolerant design\|fault tolerance]], and ''fault detection and recovery''. Fault avoidance covers proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of ''[[transaction processing\|transactions]]'', [[Replication (computer science)\|replication]] and [[Replication (computer science)#Primary-backup and multi-primary replication\|backups]]. Fault tolerance is the ability of a system to continue operation in the presence of a fault. In the event, the system should detect and recover full functionality. In any event, any actions taken should make every effort to preserve the ''single system image''. ~~==== Persistence abstraction ====~~ {{pad\|2em}}'''OceanStore: an architecture for global-scale persistent storage'''<ref>Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.</ref> ===Availability=== ~~==== Coordinator abstraction ====~~ [[Availability]] is the fraction of time during which the system can respond to requests. {{pad\|2em}}'''Weighted voting for replicated data'''<ref>Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79. ACM, New York, NY, 150-162</ref> <br />{{pad\|2em}}'''Consensus in the presence of partial synchrony'''<ref>Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.</ref> ===Performance=== ~~==== Reliability abstraction ====~~ Many [[Benchmark (computing)\|benchmark metrics]] quantify [[Computer performance\|performance]]; throughput, response time, job completions per unit time, system utilization, etc. With respect to a distributed OS, performance most often distills to a balance between [[Parallel computing\|process parallelism]] and IPC.{{Citation needed\|date=January 2012}} Managing the [[Granularity#In computing\|task granularity]] of parallelism in a sensible relation to the messages required for support is extremely effective.{{Citation needed\|date=January 2012}} Also, identifying when it is more beneficial to [[Process migration\|migrate a process]] to its data, rather than copy the data, is effective as well.{{Citation needed\|date=January 2012}} ~~{{pad\|2em}}''Sanity checks''~~ ~~<br />{{pad\|4em}}'''The Byzantine Generals Problem'''<ref>Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.</ref>~~ <br />{{pad\|4em}}'''Fail-stop processors: an approach to designing fault-tolerant computing systems'''<ref>Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.</ref> ===Synchronization=== ~~{{pad\|2em}}''Recoverability''~~ Cooperating [[Concurrent computing\|concurrent processes]] have an inherent need for [[Synchronization (computer science)\|synchronization]], which ensures that changes happen in a correct and predictable fashion. Three basic situations that define the scope of this need: <br />{{pad\|4em}}'''Distributed snapshots: determining global states of distributed systems'''<ref>Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.</ref> ~~<br />{{pad\|4em}}'''Optimistic recovery in distributed systems'''<ref>Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3 </ref>~~ :* one or more processes must synchronize at a given point for one or more other processes to continue, ~~=== Current Research ===~~ :* one or more processes must wait for an asynchronous condition in order to continue, :* or a process must establish exclusive access to a shared resource. Improper synchronization can lead to multiple failure modes including loss of [[ACID\|atomicity, consistency, isolation and durability]], [[Deadlock (computer science)\|deadlock]], [[livelock]] and loss of [[serializability]].{{Citation needed\|date=January 2012}} ~~==== replicated model extended to a component object model ====~~ {{pad\|2em}}Architectural Design of E1 Distributed Operating System<ref>L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.</Ref> ===Flexibility=== <br />{{pad\|2em}}The Cronus distributed operating system<ref>Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.</ref> [[Flexibility (engineering)\|Flexibility]] in a distributed operating system is enhanced through the modular characteristics of the distributed OS, and by providing a richer set of higher-level services. The completeness and quality of the kernel/microkernel simplifies implementation of such services, and potentially enables service providers greater choice of providers for such services.{{Citation needed\|date=April 2012}} <br />{{pad\|2em}}Fine-grained mobility in the emerald system<ref>Jul, E., Levy, H., Hutchinson, N., and Black, A. 1987. Fine-grained mobility in the emerald system. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (Austin, Texas, United States, November 08 - 11, 1987). SOSP '87. ACM, New York, NY, 105-106.</ref> ==Research== ===Replicated model extended to a component object model=== {{pad\|2em}}Architectural Design of E1 Distributed Operating System<ref>L.B. Ryzhyk, A.Y. Burtsev. Architectural design of dE1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.</ref> <br />{{pad\|2em}}The Cronus distributed operating system<ref>Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08–10, 1986). EW 2. ACM, New York, NY, 1-3.</ref> <br />{{pad\|2em}}Design and development of MINIX distributed operating system<ref>Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.</ref> ===Complexity/Trust exposure through accepted responsibility=== ~~=== Future Directions ===~~ :Scale and performance in the Denali isolation kernel.<ref>Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation</ref> ===Multi/Many-core focused systems=== ~~==== Systems able to provide low-level complexity exposure, in proportion to trust and accepted responsibility ====~~ ~~{{pad\|2em}}Application~~:The ~~performance~~multikernel: ~~and~~a ~~flexibility~~new onOS ~~exokernel~~architecture for scalable multicore systems.<ref>MBaumann, A., ~~Frans~~Barham, ~~Kaashoek~~P., ~~Dawson~~Dagand, RP. ~~Engler~~, ~~Gregory~~Harris, RT., ~~Ganger~~Isaacs, ~~Héctor M~~R. ~~Briceño~~, ~~Russell~~Peter, ~~Hunt~~S., ~~David Mazières~~Roscoe, ~~Thomas Pinckney~~T., ~~Robert Grimm~~Schüpbach, ~~John Jannotti~~A., and ~~Kenneth~~Singhania, ~~Mackenzie~~A. 2009. In ~~the~~ Proceedings of the ~~16th~~ ACM SIGOPS 22nd Symposium on Operating Systems Principles (~~SOSP~~Big ~~'97)~~Sky, ~~Saint-Malô~~Montana, ~~France~~USA, October ~~1997~~11–14, 2009). SOSP '09.</ref> ~~<br~~:Corey: ~~/>{{pad\|2em}}Scale~~an ~~and~~Operating ~~performance~~System infor ~~the Denali isolation~~Many ~~kernel~~Cores.<ref>~~Whitaker~~S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, ~~Shaw~~L. Stein, M. Wu, ~~and~~Y. ~~Gribble~~Dai, SY. DZhang, and Z. ~~2002~~Zhang. In Proceedings of the ~~5th~~2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.</ref> :Almos: Advanced Locality Management Operating System for cc-NUMA Many-Cores.<ref>Almaless, G. and Wajsbürt, F. 2011. In Proceedings of the 5th national seminar of GDR SoC-SIP, Lyon, France, 2011.</ref> ===Distributed processing over extremes in heterogeneity=== ~~==== Infrastructures focused on multi-processor/core processing ====~~ {{pad\|2em}}The multikernel: a new OS architecture for scalable multicore systems.<ref>Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.</ref> <br />{{pad\|2em}}Corey: an Operating System for Many Cores.<ref>S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.</ref> :Helios: heterogeneous multiprocessing with satellite kernels.<ref>Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11–14, 2009). SOSP '09.</ref> ~~==== Systems extending a consistent and stable impression of distributed processing over extremes in heterogeneity ====~~ {{pad\|2em}}Helios: heterogeneous multiprocessing with satellite kernels.<ref>Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.</ref> ====Effective ~~Systems able to provide effective,~~and stable, ~~and~~in ~~beneficial~~multiple ~~views~~levels of ~~vastly increased~~ complexity ~~on multiple levels =~~=== ~~{{pad\|2em}}Tesselation~~ :Tessellation: Space-Time Partitioning in a Manycore Client OS.<ref>Rose Liu, Kevin Klues, and Sarah Bird, University of California at Berkeley; Steven Hofmeyr, Lawrence Berkeley National Laboratory; [[Krste Asanović]] and John Kubiatowicz, University of California at Berkeley. HotPar09.</ref> ~~== References ==~~ ~~<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags which will then appear here automatically -->~~ ==See also== * {{annotated link\|Distributed computing}} * {{annotated link\|HarmonyOS}} * {{annotated link\|OpenHarmony}} * {{annotated link\|BlueOS}} * {{annotated link\|Plan 9 from Bell Labs}} * {{annotated link\|Inferno (operating system)\|Inferno}} * {{annotated link\|MINIX}} * {{annotated link\|Single system image}} (SSI) * {{annotated link\|Computer systems architecture}} * {{annotated link\|Multikernel}} * {{annotated link\|Operating System Projects}} * {{annotated link\|Edsger W. Dijkstra Prize in Distributed Computing}} * {{annotated link\|List of distributed computing conferences}} * {{annotated link\|List of volunteer computing projects}} ==References== {{Reflist}} ==Further reading== ~~== External links ==~~ * {{cite book\|last1=Chow\|first1=Randy\|author2=Theodore Johnson\|title=Distributed Operating Systems and Algorithms\|url=https://books.google.com/books?id=J4MZAQAAIAAJ\|year=1997\|publisher=Addison Wesley\|isbn=978-0-201-49838-7}} * Coming Soon... * {{cite book\|last=Sinha\|first=Pradeep Kumar \|title=Distributed Operating Systems: Concepts and Design\|url=https://archive.org/details/distributedopera0000sinh\|url-access=registration\|year=1997\|publisher=IEEE Press\|isbn=978-0-7803-1119-0}} * {{cite book\|last=Galli\|first=Doreen L.\|title=Distributed Operating Systems: Concepts and Practice\|url=https://archive.org/details/distributedopera00gall \|url-access=registration\|year=2000\|publisher=Prentice Hall\|isbn=978-0-13-079843-5}} ==External links== {{Prone to spam\|date=May 2022}} <!-- {{No more links}} Please be cautious adding more external links. Wikipedia is not a collection of links and should not be used for advertising. Excessive or inappropriate links will be removed. See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details. If there are already suitable links, propose additions or replacements on the article's talk page. --> {{Distributed operating systems}} {{Operating system}} {{Authority control}} {{DEFAULTSORT:Distributed Operating System}} ~~<!--- Categories --->~~ [[Category:~~Articles~~Computer ~~created via the Article Wizard~~networks]] [[Category:Distributed operating systems\| ]] [[Category:History of software]] [[Category:Operating systems]]