Distributed operating system: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:21, 20 April 2010 edit JLSjr (talk \| contribs) 106 edits No edit summary ← Previous edit		Latest revision as of 15:45, 27 April 2025 edit undo 84.9.238.175 (talk) →See also: fixed Tag: possibly inaccurate edit summary
(339 intermediate revisions by more than 100 users not shown)
Line 1: {{Short description\|Operating system designed to operate on multiple systems over a network computer}} ~~{{Userspace draft\|date=April 2010}}~~ A '''distributed operating system''' is system software over a collection of independent software, [[Computer network\|networked]], [[Inter-process communication\|communicating]], and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs.<ref name="Tanenbaum1993">{{cite journal \|last=Tanenbaum \|first=Andrew S \|date=September 1993 \|title=Distributed operating systems anno 1992. What have we learned so far? \|journal=Distributed Systems Engineering \|volume=1 \|issue=1 \|pages=3–10 \|doi=10.1088/0967-1846/1/1/001\|bibcode=1993DSE.....1....3T \|doi-access=free }}</ref> Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners.<ref name="Nutt1992">{{cite book\|last=Nutt\|first=Gary J.\|title=Centralized and Distributed Operating Systems\|url=https://archive.org/details/centralizeddistr0000nutt \|url-access=registration\|year=1992\|publisher=Prentice Hall\|isbn=978-0-13-122326-4}}</ref> The first is a ubiquitous minimal [[kernel (operating system)\|kernel]], or [[microkernel]], that directly controls that node's hardware. Second is a higher-level collection of ''system management components'' that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.<ref name="Gościński1991">{{cite book\|last=Gościński\|first=Andrzej\|title=Distributed Operating Systems: The Logical Design\|url=https://books.google.com/books?id=ZnYhAQAAIAAJ\|year=1991\|publisher=Addison-Wesley Pub. Co.\|isbn=978-0-201-41704-3}}</ref> ~~<br />~~ ~~{{Under construction\|notready=true}}~~ ~~<br />~~ ~~{{notice\|image=Gnome globe current event.svg\|~~ ~~<br />{{pad\|2em}} <big>'''''There are several improvements in process (daily) '''''</big>~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''There is basically an outline of structure in place currently'''~~ ~~<br />{{pad\|4em}} The "bullet-points" are intended to outline the potential discussion~~ ~~<br />{{pad\|4em}} There will be little, if any, bulleted-style information in the finished product~~ ~~<br />{{pad\|4em}} Comments are welcomed; especially if there are additional areas suggested~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''Some reference material supporting parts of the structure have been entered'''~~ ~~<br />{{pad\|4em}} The {{color\|red\|''snippets of text''}} under a few remaining references is a snippet of the reference itself, to display applicability~~ ~~<br />{{pad\|4em}} This type of direct copying of reference information '''''WILL NOT''''' be part of any section of this article~~ ~~<br />{{pad\|4em}} Again, this information is here to give an idea of the paper, without having to go and read it...~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''References are there to indicate from where the material for the given section will be derived'''~~ ~~<br />{{pad\|4em}} Many more reference documents will be added in the next 2-3 weeks (for every 6-10 read, maybe 1 makes it in; somewhere...)~~ ~~<br />{{pad\|4em}} Reference documents themselves will most likely '''''NOT''''' be directly included in any given section's text~~ ~~<br />{{pad\|4em}} The reference documents in any given section will heavily influence the content of that section~~ ~~<br />~~ ~~<br />{{pad\|2em}} '''All comments, observations, hints, corrections, additions, etc... are very welcome'''~~ ~~<br />{{pad\|4em}} View the progress so far: [[User:JLSjr/Distributed operating system\|Article Under Construction: JLSjr/Distributed operating system]]~~ ~~<br />{{pad\|4em}} Join current discussion: [[User:JLSjr/Distributed operating system\|Lets get busy: JLSjr/Distributed operating system]]~~ ~~<br />~~ ~~<br />{{pad\|2em}} [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 12:48, 13 March 2010 (UTC)~~ ~~<br />~~ }} ~~<br />~~ ~~{{notice\|~~ ~~<br />{{pad\|2em}} '''''Revision/Update History'''''~~ ~~<br /><br />~~ }} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Near final draft of Lead/Introduction'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 09:56, 14 April 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Again, restructured Lead/Intro sections prepatory to implementing the "body"'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 10:34, 8 April 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Rebuilt lead, in preparation of "Architectural features" overhaul'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 10:37, 7 April 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Updated much previously in lead, into "Description" section'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 06:33, 30 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Updated lead section'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 08:45, 28 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added draft of Memory access abstraction, third section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 07:30, 25 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added drafts of Transparency and Modularity, under "Architectural features"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 06:09, 24 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of the second section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 05:44, 23 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of the first section of "History"...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 05:32, 18 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of "Overview" section...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 06:45, 17 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added draft of "Lead" section...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 09:02, 16 March 2010 (UTC)~~ \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ \|style="background:#DFFFFF;width:500px;"\|'''Added "Introduction" outline; framework of text to come...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 04:56, 15 March 2010 (UTC) \|} ~~{\| class="wikitable" cellpadding="0" style="border:1.0px solid black;color:#000;margin-left:18em;"~~ \|- ~~\|style="background:#FDEE00;"\|[[Image:Nuvola apps korganizer.svg\|35px]]~~ ~~\|style="background:#DFFFFF;width:500px;"\|'''Added initial entry...'''<br />   [[User:JLSjr\|JLSjr]] ([[User talk:JLSjr\|talk]]) 12:48, 13 March 2010 (UTC)~~ \|} The microkernel and the management components collection work together. They support the system's goal of integrating multiple resources and processing functionality into an efficient and stable system.<ref name="Fortier1986">{{cite book\|last=Fortier\|first=Paul J.\|title=Design of Distributed Operating Systems: Concepts and Technology\|url=https://books.google.com/books?id=F7QmAAAAMAAJ\|year=1986\|publisher=Intertext Publications\|isbn=9780070216211}}</ref> This seamless integration of individual nodes into a global system is referred to as ''transparency'', or ''[[single system image]]''; describing the illusion provided to users of the global system's appearance as a single computational entity.<!-- is transparency required for membership in the "dos" group?--> ~~<hr style="width: 80%; height: 2px;">~~ {{TOC limit\|3}} ==Description== [[File:OS-structure2.svg\|thumb\|right\|400px\|Structure of monolithic kernel, microkernel and hybrid kernel-based operating systems]] A distributed OS provides the essential services and functionality required of an OS but adds attributes and particular [[Computer configuration\|configurations]] to allow it to support additional requirements such as increased scale and availability. To a user, a distributed OS works in a manner similar to a single-node, [[Monolithic kernel\|monolithic operating system]]. That is, although it consists of multiple nodes, it appears to users and applications as a single-node. Separating minimal system-level functionality from additional user-level modular services provides a "[[separation of mechanism and policy]]". Mechanism and policy can be simply interpreted as "what something is done" versus "how something is done," respectively. This separation increases flexibility and scalability.<!-- is this separation part of the definition or a desirable feature? isn't the reason for separating the kernel from services about hardware independent services rather than anything to do with scalability and flexibility? e.g., monolithic software can still support distributed requirements. and it's not clear what policy has to do with this topic. --> A '''Distributed operating system''' is the logical cumulative aggregation of operating system software within a Distributed System. The distributed operating system – considered collectively – is the foundation for coordinated operation of the distributed system’s independent and autonomous computational nodes.<ref name="LSF">Tanenbaum, Andrew S. 1993 Distributed operating systems anno 1992. What have we learned so far? Distributed Systems Engineering, 1, 1 (1993), 3-10</ref> Individual system nodes each contain a discrete subset of the global system’s operating system software. A given node’s system software set reveals a clean division – both physically and logically – between two distinct providers of services.<ref name="CDS">Nutt, G. J. 1992 Centralized and Distributed Operating Systems. Prentice Hall Press.</ref> ==Overview== The first is a minimal, low-level, node-servicing kernel, situated directly above the bare-metal of a node’s hardware. The kernel provides the foundation for all node-level activities. The second is a higher-level collection of system-servicing management components and services, the System Management Servers. This collection of globally-connected management components exists immediately above the microkernel, and below any user applications or APIs that might reside at higher levels.<ref name="TLD">Distributed Operating Systems: The Logical Design, 1st edition Goscinski, A. 1991 Distributed Operating Systems: the Logical Design. 1st. Addison-Wesley Longman Publishing Co., Inc.</ref> These two entities, the kernel and the management components collection, work together in supporting the distributed operating system’s goal of seamlessly integrating all network-connected resources and functionality into an efficient, available, and unified system.<ref name="DCT">Fortier, P. J. 1986 Design of Distributed Operating Systems: Concepts and Technology. Intertext Publications, Inc., McGraw-Hill, Inc.</ref> ===The kernel=== At each [[Locale (computer hardware)\|locale]] (typically a node), the kernel provides a minimally complete set of node-level utilities necessary for operating a node's underlying hardware and resources. These mechanisms include allocation, management, and disposition of a node's resources, processes, communication, and [[input/output]] management support functions.<ref name="Hansen2001">{{cite book\|editor=Hansen, Per Brinch\|title=Classic Operating Systems: From Batch Processing to Distributed Systems\|url=https://books.google.com/books?id=-PDPBvIPYBkC\|year=2001\|publisher=Springer\|isbn=978-0-387-95113-3}}</ref> Within the kernel, the communications sub-system is of foremost importance for a distributed OS.<ref name="Gościński1991"/> In a distributed OS, the kernel often supports a minimal set of functions, including low-level [[address space]] management, [[thread (computing)\|thread]] management, and [[inter-process communication]] (IPC). A kernel of this design is referred to as a [[microkernel]].<ref>Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.</ref><ref>COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.</ref> Its modular nature enhances reliability and security, essential features for a distributed OS.<ref name="Sinha1997">{{cite book\|last=Sinha\|first=Pradeep Kumar \|title=Distributed Operating Systems: Concepts and Design\|url=https://archive.org/details/distributedopera0000sinh\|url-access=registration\|year=1997\|publisher=IEEE Press\|isbn=978-0-7803-1119-0}}</ref> [[Image:System Management Components.PNG\|thumbnail\|right\|175px\|alt=General overview of system management components that reside above the microkernel.\|System management components overview]] ~~== Overview ==~~ ===System ~~The kernel~~ management=== System management components are software processes that define the node's ''policies''. These components are the part of the OS outside the kernel. These components provide higher-level communication, process and resource management, reliability, performance and security. The components match the functions of a single-entity system, adding the transparency required in a distributed environment.<ref name="Gościński1991"/> The Kernel is a minimal, but complete set of node-level utilities necessary for access to a node’s underlying hardware and resources. These mechanisms provide the complete set of “building-blocks” essential for node operation; mainly low-level allocation, management, and disposition of a node’s resources, processes, communication, and I/O management support functions.<ref name="COS">P. Brinch Hansen, Ed. 2000 Classic Operating Systems: from Batch Processing to Distributed Systems. Springer-Verlag New York, Inc.</ref> These functions are made possible by exposing a concise, yet comprehensive array of primitive mechanisms and services. The kernel is arguably the primary consideration in a distributed operating system; however, within the kernel, the subject of foremost importance is that of a well-structured and highly-efficient communications sub-system.<ref name="TLD"/> ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ In a distributed operating system, the kernel is often defined by a relative to absolute minimal architecture. A Kernel of this design is referred to as a Microkernel.<ref>Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.</ref> <ref>COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.</ref> The microkernel usually contains only the mechanisms and services which, if otherwise removed, would render a node or the global system functionally inoperable. The minimal nature of the microkernel strongly enhances a distributed operating system’s modular potential.<ref name="DCD">Distributed Operating Systems: Concepts and Design Sinha, P. K. 1996 Distributed Operating Systems: Concepts and Design. 1st. Wiley-IEEE Press.</ref> It is generally the case that the kernel is implemented directly on the bare metal of a node’s hardware; it is also common for a kernel to be replicated over all the nodes in a system.<ref name="DCP">Distributed Operating Systems Galli, D. L. 1999 Distributed Operating Systems: Concepts and Practice. 1st. Prentice Hall PTR.</ref> The combination of a kernel’s minimal design and ubiquitous coverage greatly aids in global system extensibility, and the ability to dynamically introduce new nodes or services.<ref name="DSA">Distributed Operating Systems and Algorithms Chow, R. and Chow, Y. 1997 Distributed Operating Systems and Algorithms. Addison-Wesley Longman Publishing Co., Inc.</ref> ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ The distributed nature of the OS requires additional services to support a node's responsibilities to the global system. In addition, the system management components accept the "defensive" responsibilities of reliability, availability, and persistence. These responsibilities can conflict with each other. A consistent approach, balanced perspective, and a deep understanding of the overall system can assist in identifying [[diminishing returns]].<!--this sentence is rhetoric. say what is meant. give an example.--> Separation of policy and mechanism mitigates such conflicts.<ref name="Chow1997">{{cite book\|last1=Chow\|first1=Randy\|author2=Theodore Johnson\|title=Distributed Operating Systems and Algorithms\|url=https://books.google.com/books?id=J4MZAQAAIAAJ\|year=1997\|publisher=Addison Wesley\|isbn=978-0-201-49838-7}}</ref> ~~=== System management components ===~~ A node’s system management components are a collection of software server processes that basically define the policies of a system node. These components are the composite of a node’s system software not directly required within the kernel. These software services support all of the needs of the node; namely communication, process and resource management, reliability, performance, security, scalability, and hetrogeneity to mention just a few. In this capacity the system management components compare directly to the centralized operating system of a single-entity system.<ref name="TLD"/> ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ However, these system management components have the added challenges with respect to supporting a node's responsibilities to the global system. In addition, the system management components accept the defensive responsibilities inherent to a distributed collection of networked nodes. Quite often, any effort to realize success in a particular area illuminates conflict with similar efforts in other areas. Therefore, a consistent approach of balanced perspective and understanding of the overall system and it goals can help mitigate complexity and quickly identify points of the diminishing returns. It is for this purpose that the separation of policy and mechanism is so critical.<ref name="DSA"/> === Working together as an operating system === The architecture and design of a distributed operating system ismust ~~specifically~~realize ~~aligned with realizing~~both individual ~~mode~~node and global system goals,. Architecture and design must be approached in a manner consistent with separating policy and mechanism. ~~Simply~~In ~~said~~doing so, a distributed operating system attempts to provide ~~a highly~~an efficient and reliable distributed computing framework ~~with~~allowing afor an absolute ~~minimum~~minimal user awareness of the underlying command and control efforts.<ref name="~~DCD~~Sinha1997" /> The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed system is the functional opportunity of the distributed operating system. However, this opportunity comes at a very high cost in complexity. ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ ~~===The price of complexity===~~ In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed system – including its operating system – must be calculated in terms of overcoming vast amounts of complexity on many levels, and in many areas. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.<ref>Surajbali, B., Coulson, G., Greenwood, P., and Grace, P. 2007. Augmenting reflective middleware with an aspect orientation support layer. In Proceedings of the 6th international Workshop on Adaptive and Reflective Middleware: Held At the ACM/IFIP/USENIX international Middleware Conference (Newport Beach, CA, November 26 - 30, 2007). ARM '07. ACM, New York, NY, 1-6.</ref> These design and development considerations are critical and unforgiving. For instance, an deep understanding of a distributed operating system’s overall detail is required from the start.<ref name="LSF"/> As an aid in this effort, most rely strongly on the immense amount of documented experience and research in distributed computing which exists, and continues even today. The multi-level collaboration between a kernel and the system management components, and in turn between the distinct nodes in a distributed operating system is the functional challenge of the distributed operating system. This is the point in the system that must maintain a perfect harmony of purpose, and simultaneously maintain a complete disconnect of intent from implementation. This challenge is the distributed operating system's opportunity to produce the foundation and framework for a reliable, efficient, available, robust, extensible, and scalable system. However, this opportunity comes at a very high cost in complexity. ~~===Perspectives: past, present, and future===~~ Many notable experts look to the early 1970s for the earliest distributed systems, complete by definition and capable of being considered and implemented wholly. Research and experimentation efforts began in earnest in the mid to late-1970s and continued into the early 1990s, with a few implementations achieving modest commercial success. The subject of distributed operating systems however, has a much richer historical perspective when considering design issues severally with respect to some of the individual primordial strides towards distributed computing. There are several instances of fundamental and pioneering implementations of primitive distributed system and component concepts dating back to the early 1950s. Looking to the modern distributed system and its future, the accelerating proliferation of multiprocessor systems and multi-core processors has led to a re-emergence of the distributed system concept. The inherent challenges in many-core and multiprocessor science has led to an enormous increase in distributed system related research. Many of these research efforts investigate and describe interesting and plausible paradigms for the future of distributed computing. ===The price of complexity=== In a distributed operating system, the exceptional degree of inherent complexity could easily render the entire system an anathema to any user. As such, the logical price of realizing a distributed operation system must be calculated in terms of overcoming vast amounts of complexity in many areas, and on many levels. This calculation includes the depth, breadth, and range of design investment and architectural planning required in achieving even the most modest implementation.<ref>Surajbali, B., Coulson, G., Greenwood, P., and Grace, P. 2007. Augmenting reflective middleware with an aspect orientation support layer. In Proceedings of the 6th international Workshop on Adaptive and Reflective Middleware: Held At the ACM/IFIP/USENIX international Middleware Conference (Newport Beach, CA, November 26–30, 2007). ARM '07. ACM, New York, NY, 1-6.</ref> These design and development considerations are critical and unforgiving. For instance, a deep understanding of a distributed operating system's overall architectural and design detail is required at an exceptionally early point.<ref name="Tanenbaum1993"/> An exhausting array of design considerations are inherent in the development of a distributed operating system. Each of these design considerations can potentially affect many of the others to a significant degree. This leads to a massive effort in balanced approach, in terms of the individual design considerations, and many of their permutations. As an aid in this effort, most rely on documented experience and research in distributed computing power. ~~== Description ==~~ A Distributed operating system is an [[operating system]]. This statement may be trivial, but it is not always overt and obvious because the distributed operating system is such an integral part of the [[distributed system]]. This idea is synonymous to the consideration of a [[Square (geometry)#Properties\|square]]. A square might not immediately be recognized as a rectangle. Although possessing all requisite [[Attribute (computing)\|attributes]] defining a rectangle, a square’s additional attributes and specific configuration provide a disguise. At its core, the distributed operating system provides only the essential services and minimal functionality required of an operating system, but its additional attributes and particular [[Computer configuration\|configuration]] make it different. The Distributed operating system fulfills its role as operating system; and does so in a manner indistinguishable from a centralized, [[Monolithic kernel\|monolithic operating system]]. That is, although distributed in nature, it supports the system’s appearance as a singular, local entity. ~~<br />~~ ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=[[Image:Nuvola apps kchart.svg\|40px]]~~ ~~\|info=A Diagram will be furnished to assist in illustration of this idea.~~ ~~\|float = right}}~~ An operating system, at a basic level, is expected to isolate and manage the physical complexities of lower-level [[hardware]] resources. In turn, these complexities are organized into simplified logical [[abstractions]] and presented to higher-level entities as [[Interface (computer science)\|interfaces]] into the underlying [[Resource (computer science)\|resources]]. These marshalling and presentation activities take place in a secure and protected environment, often referred to as the “[[Supervisor mode#Supervisor mode\|system-level]],” and describe a minimal scope of practical operating system functionality. In graphical depictions however, most monolithic operating systems would be illustrated as a discrete container sandwiched between the local hardware resources below and application programs above. The operating system container would be filled with a robust compliment of services and functions to support as many potential needs as possible or practical. This full-featured collection of services would reside and execute at the system-level and support higher, “[[User space\|user-level]]” applications and services. ==History== A distributed operating system, illustrated in a similar fashion, would be a container suggesting [[Microkernel#Essential components and minimality\|minimal operating system functionality and scope]]. This container would completely cover all disseminated hardware resources, defining the system-level. The container would extend across the system, supporting a layer of modular software components existing in the user-level. These software components supplement the distributed system with a configurable set of added services, usually integrated within the monolithic operating system (and the system-level). This division of minimal system-level function from additional user-level modular services provides a “[[separation of mechanism and policy]].” Mechanism and policy can be simply interpreted as "how something is done" versus "why something is done," respectively. Achieving this separation allows for an exceptionally loosely coupled, flexible, and scalable distributed system. Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success. Fundamental and pioneering implementations of primitive distributed operating system component concepts date to the early 1950s.<ref name=dyseac>{{cite journal \|last1=Leiner \|first1=Alan L. \|title=System Specifications for the DYSEAC \|journal=Journal of the ACM \|date=April 1954 \|volume=1 \|issue=2 \|pages=57–81 \|doi=10.1145/320772.320773 \|s2cid=15381094 \|doi-access= }}</ref><ref name=lincoln_tx2>{{cite conference \|title=The Lincoln TX-2 Input-Output System \|first=James W. \|last=Forgie \|date=February 26–28, 1957 \|conference=Western Joint Computer Conference: Techniques for Reliability \|publisher=Association for Computing Machinery \|___location=Los Angeles, California \|pages=156–160 \|isbn=9781450378611 \|doi=10.1145/1455567.1455594 \|doi-access=free }}</ref><ref name=intercomm_cells>{{cite conference \|author=C. Y. Lee \|title=Intercommunicating cells, basis for a distributed logic computer \|date=December 4–6, 1962 \|conference=Fall Joint Computer Conference \|publisher=Association for Computing Machinery \|___location=Philadelphia, Pennsylvania \|pages=130–136 \|doi=10.1145/1461518.1461531 \|doi-access=free}}</ref> Some of these individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing.<ref name="Dreyfus_1958_Gamma60">{{citation \|title=System design of the Gamma 60 \|author-first=Phillippe \|author-last=Dreyfus \|author-link=Philippe Dreyfus \|work=Proceedings of the May 6–8, 1958, [[Western Joint Computer Conference]]: Contrasts in Computers \|___location=Los Angeles \|date=1958-05-08 \|orig-year=1958-05-06 \|id=IRE-ACM-AIEE '58 (Western) \|publication-place=ACM, New York, NY, USA \|pages=130–133 \|url=https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf \|access-date=2017-04-03 \|url-status=live \|archive-url=https://web.archive.org/web/20170403224547/https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf \|archive-date=2017-04-03}}</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9–13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09–13, 1957). IRE-ACM-AIEE '57</ref><ref>Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.</ref><ref>Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.</ref><ref>Estrin, G. 1960. [https://dl.acm.org/doi/abs/10.1145/1460361.1460365 Organization of computer systems: the fixed plus variable structure computer]. In Papers Presented At the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03–05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.</ref> ~~== Distributed computing models ==~~ In the mid-1970s, research produced important advances in distributed computing. These breakthroughs provided a solid, stable foundation for efforts that continued through the 1990s. ~~=== The nature of distribution ===~~ The unique nature of the Distributed operating system is both subtle and complex. A distributed operating system’s hardware infrastructure elements are not centralized, that is the elements do not have a tight proximity to one another at a single ___location. A given distributed operating system’s structure elements could reside in various rooms within a building, or in various buildings around the world. This geographically spatial dissemination defines its decentralization; however, the distributed operating system is a distributed system, not simply decentralized. The accelerating proliferation of [[Multiprocessing\|multi-processor]] and [[multi-core processor]] systems research led to a resurgence of the distributed OS concept. This distinction is the source of the subtlety and complexity. While decentralized systems and distributed systems are both spatially diverse, it is the specific manner of and relative degree in linkage between the elements, or nodes in the systems that differentiate the two. In the case of these two types of operating system, these linkages are the lines of [[Inter-process communication\|communication]] between the nodes of the system. ===The DYSEAC=== ~~=== Three basic distributions ===~~ One of the first efforts was the [[DYSEAC]], a general-purpose [[Synchronization (computer science)\|synchronous]] computer. In one of the earliest publications of the [[Association for Computing Machinery]], in April 1954, a researcher at the [[National Bureau of Standards]]{{snd}} now the National [[nist\|Institute of Standards and Technology]] ([[nist\|NIST]]){{snd}} presented a detailed specification of the DYSEAC. The introduction focused upon the requirements of the intended applications, including flexible communications, but also mentioned other computers: To better illustrate this point, let us more closely reflect upon these three system [[Software architecture\|architectures]]; centralized, decentralized, and distributed. In this examination, we will consider three tightly-related aspects of their structure: organization, connection, and control. Organization will describe physical arrangement characteristics, connection will involve associations among constituent structural entities, and control will correlate the manner, necessity, and rationale of the earlier considerations. ~~<br />~~ ~~{{Userbox~~ ~~\|border-c=#000~~ ~~\|border-s=1~~ ~~\|id-c=LightSteelBlue~~ ~~\|id-s=12~~ ~~\|id-fc=#000~~ ~~\|info-c=#fff~~ ~~\|info-s=8~~ ~~\|info-fc=#000~~ ~~\|id=<br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />[[Image:Nuvola apps kchart.svg\|40px]]<br /><br />~~ ~~\|info=Multiple Diagrams<br /><br />will be furnished to assist<br /><br />in illustration of these ideas.~~ ~~\|float = right}}~~ ~~==== Organization ====~~ Firstly, we consider the subject of organization. A centralized system is organized most simply, basically one real level of structure and all constituent element’s highly influenced by and ultimately dependent upon this organization. The Decentralized system is a more [[Federation\|federated structure]], multiple levels where subsets of a system’s entities unite, these entity collections in turn uniting at higher levels, in the direction of and culminating at the central element. The distributed system has no discernable or necessary levels; it is purely an autonomous collection of discrete elements. {{blockquote\|Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} ~~==== Connection ====~~ Association linkages between elements will be the second consideration. In each case, physical association is inextricably linked (or not), to conceptual organization. The centralized system has its constituent members directly united to a central entity. One could conceptualize holding a bunch of [[balloon\|balloons]] -- each on a string, -- with the hand being the central figure. A decentralized system incorporates a single-step direct, or multi-step indirect path between any given constituent element and the central entity. This can be understood by thinking of a [[Org_chart#Example_of_an_organizational_chart\|corporate organizational chart]], the first level connecting directly, and lower levels connecting indirectly through successively higher levels (no lateral “dotted” lines). Finally, the distributed system has no inherent pattern; direct and indirect connections are possible between any two given elements of the system. Think of the 1970’s phenomena of “[[string art]],” a [[spirograph]] drawing, a [[spider web\|spider’s web]], or the [[Interstate Highway System]] between U.S. cities. The specification discussed the architecture of multi-computer systems, preferring peer-to-peer rather than master-slave. ~~==== Control ====~~ {{blockquote\|Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} Notice, that the centralized and decentralized systems have distinctly directed flows of connection towards the central entity, while the distributed system is in no way influenced specifically by virtue of its organization. This is the pivotal notion of the third consideration. What correlations exist between a system’s organization, and its associations? In all three cases, it is an extremely delicate balance between the administration of processes, and the scope and extensibility of those processes; in essence is about the sphere of control. Simply put, in the directed systems there is more control, easing administration of processes, but constraining their possible scope. On the other hand, the distributed system is much more difficult to control, but is effectively limited in extensible scope only by the capabilities of that control. The associations of the distributed system conform to the needs of its processes, and not inherently in any way to its organizational configuration. There are key collections of extended distributed operating system processes discussed later in this article. This is one of the earliest examples of a computer with distributed control. The [[United States Department of the Army\|Dept. of the Army]] reports<ref>Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961</ref> certified it reliable and that it passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. This was a "[[portable computer]]", housed in a [[Tractor-trailer#Types of trailers\|tractor-trailer]], with 2 attendant vehicles and [[Refrigerator truck\|6 tons of refrigeration]] capacity. ~~=== Conclusions ===~~ Lastly, as to the nature of the distributed system, it has been stated that a distributed operating system is not necessarily an operating system at all; but simply "is" the distributed system. This view is commonly justified by pointing to the deep and inextricable integration into the distributed system. The absolute and singular focus of sustaining and maintenance of the system is also used as rationale. However, it is important to remember the separation of mechanism and policy. The distributed operating system and its mechanism is not affected by any degree of integration, and no amount of focus on providing this mechanism changes the responsibility of policy, or expectation of results at the distributed system level. As mentioned earlier, a [[Square_(geometry)#Other_facts\|square]] is a [[rectangle]]; and no level of effort exerted by the square in maintaining four equivalent dimensions changes anything. ===Lincoln TX-2=== ~~==Major Design Considerations==~~ Described as an experimental input-output system, the [[Lincoln TX-2]] emphasized flexible, simultaneously operational input-output devices, i.e., [[multiprogramming]]. The design of the TX-2 was modular, supporting a high degree of modification and expansion.<ref name=lincoln_tx2/> ~~===Transparency===~~ Transparency, simply put, is the quality of a distributed system to be seen and understood as a single-system image; and by far the greatest overriding consideration in the high-level conceptual design of a distributed operating system. While a simple concept, this one issue touches and affects decision making in almost every aspect of design by introducing requirements and/or restrictions on those aspects and often in their relationships with others. Inter-Process Communication (IPC) is the critical complement to transparency, as low-level IPC implementation considerations. General communications, process interactions, and data flows all depend on IPC sub-systems. Each situation requires fast, efficient, and reliable exchange capabilities; requiring both efficient primitives and stable protocol. And while this often leads to various scenario-specific solutions, the calling interface must be consistent. The system employed The Multiple-Sequence Program Technique. This technique allowed multiple [[program counter]]s to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion related to device sequencing. ~~===Process Management===~~ Similar to DYSEAC the TX-2 separately programmed devices can operate simultaneously, increasing [[throughput]]. The full power of the central unit was available to any device. The TX-2 was another example of a system exhibiting distributed control, its central unit not having dedicated control.<!-- seems questionable unless the devices were explicitly other computers--> Process management is a global system concept, which provides mechanisms for effective and efficient use and sharing of processing resources throughout the system. These resources, and operations on them, can be either local or remote; however, in either event, they must remain completely consistent from the user perspective. As an example, Load Balancing is an important process management function. Some of the questions involved are which process to move, and when and where to move it. These are Policy decisions relegated to Resource Management; but, the migration of the process (ex. moveProcess(fromA, toB) is a mechanism implementation of Process Management. The migration process, either local to another core or remote to another computer, again must remain consistent in presentation to the user. Other functions of this sub-system include the allocation and de-allocation of processes and ports, as well as provisions to run, suspend, and resume execution of a process. Again, these are mechanisms, related only to "What" is done, not which one, how, or where. ===Intercommunicating Cells=== ~~===Resource Management===~~ One early effort at abstracting memory access was Intercommunicating Cells, where a cell was composed of a collection of [[Computer data storage\|memory]] elements. A memory element was basically a binary electronic [[flip-flop (electronics)\|flip-flop]] or [[relay]]. Within a cell there were two types of elements, ''symbol'' and ''cell''. Each cell structure stores [[data]] in a [[String (computer science)\|string]] of symbols, consisting of a [[Identifier\|name]] and a set of [[parameter]]s. Information is linked through cell associations.<ref name=intercomm_cells/> Systems resources such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. Load sharing and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many algorithms exist to aid in these decisions; however, this calls for a second-level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario. The theory contended that addressing is a wasteful and non-valuable [[indirection\|level of indirection]]. Information was accessed in two ways, direct and cross-retrieval. Direct retrieval accepts a name and returns a parameter set. Cross-retrieval [[Projection (mathematics)\|projects]] through parameter sets and returns a set of names containing the given [[subset]] of parameters. This was similar to a modified [[hash table]] [[data structure]] that allowed multiple [[Value (mathematics)\|values]] (parameters) for each [[Unique key\|key]] (name). ~~===Reliability===~~ One of the basic tenants of distributed systems is a high-level of reliability. This quality attribute of a distributed system has become a staple expectation. Reliability is most often considered from the perspectives of availability and security of a system's hardware, services, and data. Issues arising from availability failures or security violations are considered faults. Faults are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults. There are four general methods for dealing with faults: fault avoidance, fault tolerance, and fault detection and recovery. Fault avoidance are proactive measures taken to minimize the occurrence of faults, and fault tolerance is the ability of a system to continue some level operation in the face of a fault. In the event a fault does occur, the system should detect the fault and have the capability to respond quickly and effectively to recover full functionality. {\| style="width=100%;" ~~===Performance===~~ \|- valign="top" Performance is arguably the quintessential computing concern, and in the distributed system, it is no different. Many benchmark metrics exist for performance; throughput, job completions per unit time, system utilization, etc. Each of these benchmarks are more meaningful in describing some scenarios, and less in others. With respect to a distributed system, this consideration most often distills to a balance between process parallelism and IPC. Managing the task granularity of parallelism in a sensible relation to the messages required for support is extremely effective. Also, identifying when it is more beneficial to migrate a process to its data, rather than copy the data, is effective as well. Many process and resource management algorithms, and algorithms in this space work to maximize performance. \| colspan="3" \|Cellular memory would have many advantages: \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| A major portion of a system's [[Boolean logic\|logic]] is distributed within the associations of information stored in the cells, \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| This flow of information association is somewhat guided by the act of storing and retrieving, \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| The time required for storage and [[Information retrieval\|retrieval]] is mostly [[constant time\|constant]] and completely unrelated to the size and fill-factor of the memory \|- valign="top" \| width="20px" \| \|\| width="10px" \| [[File:Writing bullet.svg\|top]] \|\| Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size \|} This [[Computer configuration\|configuration]] was ideal for distributed systems. The constant-time projection through memory for storing and retrieval was inherently [[Atomic operation\|atomic]] and [[Mutual exclusion\|exclusive]]. The cellular memory's intrinsic distributed characteristics<!-- are these intrinsically distributed or merely abstract?--> would be invaluable. The impact on the [[User interface\|user]], [[Computer hardware\|hardware]]/[[Peripheral\|device]], or [[Application programming interface]]s was indirect. The authors were considering distributed systems, stating: ~~===Synchronization===~~ Cooperating concurrent processes have an inherent need for synchronization. Three basic situations that define the scope of this need; one or more processes must synchronize at a given point for one or more other processes to continue, one or more processes must wait for an asynchronous condition in order to continue, or a process must establish mutual exclusive access to a shared resource. There is a multitude of algorithms available for these scenarios, and their many variations. Unfortunately, whenever synchronization is required the opportunity for process deadlock usually exists. The ancillary situation of deadlock is covered below. {{blockquote\|We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.\|Chung-Yeol (C. Y.) Lee\|''Intercommunicating Cells, Basis for a Distributed Logic Computer''}} ~~===Flexibility===~~ Flexibility in a distributed system is made possible through the modular characteristics of the microkernel. With the microkernel presenting a minimal -- but complete -- set of primitives and basic functionally cohesive services, The higher-level management components can be composed in a similar functionally cohesive manner. This capability leads to exceptional flexibility in the management components collection; but more importantly, it allows the opportunity to dynamically swap, upgrade, or install additional of components above the kernel. ===Foundational work=== ~~==Transparency Responsibilities==~~ ====Coherent memory abstraction==== ~~===Location Transparency===~~ {{pad\|2em}} Algorithms for scalable synchronization on shared-memory multiprocessors <ref>Mellor-Crummey, J. M. and Scott, M. L. 1991. [https://dl.acm.org/doi/abs/10.1145/103727.103729 Algorithms for scalable synchronization on shared-memory multiprocessors]. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.</ref> System should create and maintain the user's perception and understanding of the entirety of the system, its devices, and resources as local entities. At no point in any user's system experience should there exist any expectation of any user to be ===~~Access~~=File ~~Transparency~~System abstraction==== {{pad\|2em}}Measurements of a distributed file system<ref>Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. [http://people.csail.mit.edu/ledlie/resources/papers/1991/baker.ps Measurements of a distributed file system]. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13–16, 1991). SOSP '91. ACM, New York, NY, 198-212.</ref> ~~System entities or processes maintain consistent access/entry mechanism, regardless of being local or remote~~ <br />{{pad\|2em}}Memory coherence in shared virtual memory systems <ref>Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.</ref> ===~~Migration~~=Transaction ~~Transparency~~abstraction==== {{pad\|2em}}''Transactions'' Resources and processes can be migrated, without user-knowledge, by the system to another node in an attempt to maximize efficiency, reliability, and security. Requires policy decision-making abilities, Naming stability, and in the event of a process migration, all IPC messages must be received or held pending the migration. <br />{{pad\|4em}} Sagas <ref>Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27–29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.</ref> {{pad\|2em}}''Transactional Memory'' ~~===Replication Transparency===~~ <br />{{pad\|4em}}Composable memory transactions<ref>Harris, T., Marlow, S., [[Simon Peyton Jones\|Peyton-Jones, S.]], and Herlihy, M. 2005. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.77.3476&rep=rep1&type=pdf Composable memory transactions]. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15–17, 2005). PPoPP '05. ACM, New York, NY, 48-60.</ref> Systems entities can be copied to strategic points in the system to increase efficiencies through better proximity, and also provide for improved reliability through the distributed replication as a back-up; prompted by dynamic stratagem. <br />{{pad\|4em}}Transactional memory: architectural support for lock-free data structures <ref>Herlihy, M. and Moss, J. E. 1993. [http://hpl.americas.hp.net/techreports/Compaq-DEC/CRL-92-7.pdf Transactional memory: architectural support for lock-free data structures]. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16–19, 1993). ISCA '93. ACM, New York, NY, 289-300.</ref> <br />{{pad\|4em}}Software transactional memory for dynamic-sized data structures<ref>Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.59.8787&rep=rep1&type=pdf Software transactional memory for dynamic-sized data structures]. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13–16, 2003). PODC '03. ACM, New York, NY, 92-101.</ref> <br />{{pad\|4em}}Software transactional memory<ref>Shavit, N. and Touitou, D. 1995. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.474.5928&rep=rep1&type=pdf Software transactional memory]. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottawa, Ontario, Canada, August 20–23, 1995). PODC '95. ACM, New York, NY, 204-213.</ref> ===~~Concurrency~~=Persistence ~~Transparency~~abstraction==== {{pad\|2em}}OceanStore: an architecture for global-scale persistent storage <ref>Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.439.4822&rep=rep1&type=pdf OceanStore: an architecture for global-scale persistent storage]. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.</ref> System should possess and exhibit properties to allow multiple simultaneous uses of system resources between users ho are kept unaware of the concurrent usage. Required properties are synchronization mechanisms to keep events ordered and consistent, mutual-exclusivity management for resources, sufficient capabilities to detect and recover from both starvation and deadlock. ===~~Parallel~~=Coordinator ~~Transparency~~abstraction==== {{pad\|2em}} Weighted voting for replicated data <ref>Gifford, D. K. 1979. [http://pages.cs.wisc.edu/~remzi/Classes/739/Spring2004/Papers/p150-gifford.pdf Weighted voting for replicated data]. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10–12, 1979). SOSP '79. ACM, New York, NY, 150-162</ref> System should have stable performance characteristics, regardless if some nodes increase rapidly in workload, through properties of migration, replication, and concurrency. This requires an intelligent policy decision stratagem to facilitate the timely and accurate allocation, migration, and disposition of resources. <br />{{pad\|2em}} Consensus in the presence of partial synchrony <ref>Dwork, C., Lynch, N., and Stockmeyer, L. 1988. [https://groups.csail.mit.edu/tds/papers/Lynch/MIT-LCS-TM-270.pdf Consensus in the presence of partial synchrony]. J. ACM 35, 2 (Apr. 1988), 288-323.</ref> ~~===Failure Transparency===~~ The system should shield users from the knowledge of and the affects resulting from failures. In the event of a partial failure, the system is responsible for rapid and accurate detection and orchestration of a remedy with little, if any imposition on users. These methods can range from static proactive posturing to dynamic and more flexible response mechanisms. ===~~Perform~~=Reliability ~~Transparency~~abstraction==== {{pad\|2em}}''Sanity checks'' System should create and maintain a reasonable, stable, and predictable performance expectation for the user, that is both resilient from and helpful in situations where parts of the system may experience significant delay or even failure. While reasonable and predictable are important, there should be no inherent expectation or expressed indication of fairness or equality. <br />{{pad\|4em}}The Byzantine Generals Problem <ref>Lamport, L., Shostak, R., and Pease, M. 1982. [http://people.cs.uchicago.edu/~shanlu/teaching/33100_wi15/papers/byz.pdf The Byzantine Generals Problem]. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.</ref> <br />{{pad\|4em}}Fail-stop processors: an approach to designing fault-tolerant computing systems <ref>Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.</ref> {{pad\|2em}}''Recoverability'' ~~===Name Transparency===~~ <br />{{pad\|4em}}''Distributed'' snapshots: determining global states of distributed systems<ref>Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.</ref> ~~All system entities should maintain a complete decoupling between entity naming from any spatial or temporal ___location, as well as any other system entity.~~ <br />{{pad\|4em}}Optimistic recovery in distributed systems <ref>Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3</ref> ~~===Size/Scale Transparency===~~ ~~A user's experience or perception of their system should remain stable and consistent in the face of system extension, scaling, or waning due to failure.~~ ~~===Revision Transparency===~~ System users should be completely oblivious to system-software version changes and changes in internal implementation of system infrastructure. While a user may become aware of, or discover the availability of a new function or service, the implementation or alteration of the systems internal structure should in no way be the prompt for this discovery. ==Distributed computing models== ~~===Control Transparency===~~ {{More citations needed section\|date=January 2012}} ~~All system constants, properties, configuration settings, etc. should be completely consistent in appearance, connotation, and denotation to all users and software applications aware of them.~~ ~~===Data Transparency===~~ ~~No system data-entity should expose itself as peculiar when required to interact remotely.~~ ===Three basic distributions=== To better illustrate this point, examine three system [[Software architecture\|architectures]]; centralized, decentralized, and distributed. In this examination, consider three structural aspects: organization, connection, and control. Organization describes a system's physical arrangement characteristics. Connection covers the communication pathways among nodes. Control manages the operation of the earlier two considerations. ====Organization==== ~~==Historical Perspectives==~~ A [[Centralized computing\|centralized system]] has one level of structure, where all constituent elements directly depend upon a single control element. A [[decentralized system]] is hierarchical. The bottom level unites subsets of a system's entities. These entity subsets in turn combine at higher levels, ultimately culminating at a central master element. A distributed system is a collection of autonomous elements with no concept of levels. ====Connection==== ~~=== Pioneering inspirations ===~~ Centralized systems connect constituents directly to a central master entity in a hub and spoke fashion. A decentralized system (aka [[Network operating system\|network system]]) incorporates direct and indirect paths between constituent elements and the central entity. Typically this is configured as a hierarchy with only one shortest path between any two elements. Finally, the distributed operating system requires no pattern; direct and indirect connections are possible between any two elements. Consider the 1970s phenomena of “[[string art]]” or a [[spirograph]] drawing as a [[Fully connected network\|fully connected system]], and the [[spider web\|spider's web]] or the [[Interstate Highway System]] between U.S. cities as examples of a ''partially connected system''. ====Control==== With a cursory glance around the internet, or a modest perusal of pertinent writings, one could very easily gain the notion that computer operating systems were a new phenomenon in the mid-twentieth century. In fact, important research in operating systems was being conducted at this time.<ref>Dreyfuss, P. 1958. System design of the Gamma 60. In Proceedings of the May 6-8, 1958, Western Joint Computer Conference: Contrasts in Computers (Los Angeles, California, May 06 - 08, 1958). IRE-ACM-AIEE '58 (Western). ACM, New York, NY, 130-133. </ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9-13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09 - 13, 1957). IRE-ACM-AIEE '57</ref><ref>Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.</ref><ref>Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3-5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03 - 05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335. </ref><ref>Estrin, G. 1960. Organization of computer systems: the fixed plus variable structure computer. In Papers Presented At the May 3-5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03 - 05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.</ref> While early exploration into operating systems took place in the years leading to 1950; shortly afterward, highly advanced research began on new systems to conquer new problems. In the first decade of the second-half of the [[20th century]], many new questions were asked, many new problems were identified, many solutions were developed and working for years, in controlled production environments. Centralized and decentralized systems have directed [[Software flow control\|flows of connection]] to and from the central entity, while distributed systems communicate along arbitrary paths. This is the pivotal notion of the third consideration. Control involves allocating tasks and data to system elements balancing efficiency, responsiveness, and complexity. Centralized and decentralized systems offer more control, potentially easing administration by limiting options. Distributed systems are more difficult to explicitly control, but scale better horizontally and offer fewer points of system-wide failure. The associations conform to the needs imposed by its design but not by organizational chaos ~~==== Aboriginal Distributed Computing ====~~ ~~'''The DYSEAC'''<ref>Leiner, A. L. 1954. System Specifications for the DYSEAC. J. ACM 1, 2 (Apr. 1954), 57-81.</ref> (1954)~~ ==Design considerations== One of the first solutions to these new questions was the [[DYSEAC]], a self-described general-purpose [[Synchronization (computer science)\|synchronous]] computer; but at this point in history, exhibited signs of being much more than general-purpose. In one of the earliest publications of the [[ACM]], in April of 1954, a researcher at the [[National Bureau of Standards]] – now the National [[nist\|Institute of Standards and Technology]] ([[nist\|NIST]]) – presented a detailed implementation design specification of the DYSEAC. Without carefully reading the entire specification, one could be misled by summary language in the introduction, as to the nature of this machine. The initial section of the introduction advises that major emphasis will be focused upon the requirements of the intended applications, and these applications would require flexible communication. However, suggesting the external devices could be typewriters, [[Magnetic storage\|magnetic medium]], and [[Cathode ray tube\|CRTs]], and with the term “[[Input/output\|input-output operation]]” used more than once, could quickly limit any paradigm of this system to a complex centralized “ensemble.” Seemingly, saving the best for last, the author eventually describes the true nature of the system. ===Transparency=== {{quote\|Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} ''Transparency'' or ''single-system image'' refers to the ability of an application to treat the system on which it operates without regard to whether it is distributed and without regard to hardware or other implementation details. Many areas of a system can benefit from transparency, including access, ___location, performance, naming, and migration. The consideration of transparency directly affects decision making in every aspect of design of a distributed operating system. Transparency can impose certain requirements and/or restrictions on other design considerations. Systems can optionally violate transparency to varying degrees to meet specific application requirements. For example, a distributed operating system may present a hard drive on one computer as "C:" and a drive on another computer as "G:". The user does not require any knowledge of device drivers or the drive's ___location; both devices work the same way, from the application's perspective. A less transparent interface might require the application to know which computer hosts the drive. Transparency domains: While this more detailed description elevates the perception of the system, the best that can be distilled from this is some semblance of decentralized control. The avid reader, persevering in the investigation would get to a point at which the real nature of the system is divulged. * ''Location transparency'' – Location transparency comprises two distinct aspects of transparency, naming transparency and user mobility. Naming transparency requires that nothing in the physical or logical references to any system entity should expose any indication of the entity's ___location, or its local or remote relationship to the user or application. User mobility requires the consistent referencing of system entities, regardless of the system ___location from which the reference originates.<ref name="Sinha1997" />{{rp\|20}} * ''Access transparency'' – Local and remote system entities must remain indistinguishable when viewed through the user interface. The distributed operating system maintains this perception through the exposure of a single access mechanism for a system entity, regardless of that entity being local or remote to the user. Transparency dictates that any differences in methods of accessing any particular system entity—either local or remote—must be both invisible to, and undetectable by the user.<ref name="Gościński1991"/>{{rp\|84}}<!--what is the difference between referencing and access?--> * ''Migration transparency'' – Resources and activities migrate from one element to another controlled solely by the system and without user/application knowledge or action.<ref name="Galli2000">{{cite book\|last=Galli\|first=Doreen L.\|title=Distributed Operating Systems: Concepts and Practice\|url=https://archive.org/details/distributedopera00gall \|url-access=registration\|year=2000\|publisher=Prentice Hall\|isbn=978-0-13-079843-5}}</ref>{{rp\|16}} * ''Replication transparency'' – The process or fact that a resource has been duplicated on another element occurs under system control and without user/application knowledge or intervention.<ref name="Galli2000" />{{rp\|16}} * ''Concurrency transparency'' – Users/applications are unaware of and unaffected by the presence/activities of other users.<ref name="Galli2000" />{{rp\|16}} * ''Failure transparency'' – The system is responsible for detection and remediation of system failures. No user knowledge/action is involved other than waiting for the system to resolve the problem.<ref name="Chow1997" />{{rp\|30}} * ''Performance Transparency'' – The system is responsible for the detection and remediation of local or global performance shortfalls. Note that system policies may prefer some users/user classes/tasks over others. No user knowledge or interaction. is involved.<ref name="Sinha1997" />{{rp\|23}} * ''Size/Scale transparency'' – The system is responsible for managing its geographic reach, number of nodes, level of node capability without any required user knowledge or interaction.<ref name="Sinha1997" />{{rp\|23}} * ''Revision transparency'' – The system is responsible for upgrades and revisions and changes to system infrastructure without user knowledge or action.<ref name="Chow1997" />{{rp\|30}} * ''Control transparency'' – The system is responsible for providing all system information, constants, properties, configuration settings, etc. in a consistent appearance, connotation, and denotation to all users and applications.<ref name="Gościński1991"/>{{rp\|84}} * ''Data transparency'' – The system is responsible for providing data to applications without user knowledge or action relating to where the system stores it.<ref name="Gościński1991"/>{{rp\|85}} * ''Parallelism transparency'' – The system is responsible for exploiting any ability to parallelize task execution without user knowledge or interaction. Arguably the most difficult aspect of transparency, and described by Tanenbaum as the "Holy grail" for distributed system designers.<ref name="Tanenbaum1995">{{cite book\|last=Tanenbaum\|first=Andrew S.\|title=Distributed Operating Systems\|url=https://archive.org/details/unset0000unse_h1q3\|url-access=registration\|year=1995\|publisher=Prentice Hall\|isbn=978-0-13-219908-7}}</ref>{{rp\|23–25}} ===Inter-process communication=== {{quote\|Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …it should be noted that the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.\|ALAN L. LEINER\|''System Specifications for the DYSEAC''}} [[Inter-Process Communication]] (IPC) is the implementation of general communication, process interaction, and [[dataflow]] between [[Thread (computer science)\|threads]] and/or [[Process (computing)\|processes]] both within a node, and between nodes in a distributed OS. The intra-node and inter-node communication requirements drive low-level IPC design, which is the typical approach to implementing communication functions that support transparency. In this sense, Interprocess communication is the greatest underlying concept in the low-level design considerations of a distributed operating system. ===Process management=== This is one of the earliest examples of a computer with distributed control. [[United States Department of the Army\|Dept. of the Army]] reports<ref>Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961</ref> show it was certified reliable and passed all acceptance tests in April of 1954. It was completed and delivered on time, in May of 1954. In addition, was it mentioned that this was a [[portable computer]]? It was housed in [[Tractor-trailer#Types_of_trailers\|tractor-trailer]], and had 2 attendant vehicles and [[Refrigerator truck\|6 tons of refrigeration]] capacity. [[Process management (computing)\|Process management]] provides policies and mechanisms for effective and efficient sharing of resources between distributed processes. These policies and mechanisms support operations involving the allocation and de-allocation of processes and ports to processors, as well as mechanisms to run, suspend, migrate, halt, or resume process execution. While these resources and operations can be either local or remote with respect to each other, the distributed OS maintains state and synchronization over all processes in the system. As an example, [[Load balancing (computing)\|load balancing]] is a common process management function. Load balancing monitors node performance and is responsible for shifting activity across nodes when the system is out of balance. One load balancing function is picking a process to move. The kernel may employ several selection mechanisms, including priority-based choice. This mechanism chooses a process based on a policy such as 'newest request'. The system implements the policy ~~==== Multi-programming abstraction ====~~ '''The Lincoln TX-2'''<ref>Forgie, J. W. 1957. The Lincoln TX-2 input-output system. In Papers Presented At the February 26-28, 1957, Western Joint Computer Conference: Techniques For Reliability (Los Angeles, California, February 26 - 28, 1957). IRE-AIEE-ACM '57 (Western). ACM, New York, NY, 156-160.</ref> (1957) ===Resource management=== Described as an input-output system of experimental nature, the Lincoln TX-2 placed a premium on flexibility in its association of simultaneously operational input-output devices. The design of the TX-2 was modular, supporting a high degree of modification and expansion, as well as flexibility in operating and programming of its devices. The system employed The Multiple-Sequence Program Technique. [[Resource (computer science)\|Systems resources]] such as memory, files, devices, etc. are distributed throughout a system, and at any given moment, any of these nodes may have light to idle workloads. ''Load sharing'' and load balancing require many policy-oriented decisions, ranging from finding idle CPUs, when to move, and which to move. Many [[algorithm]]s exist to aid in these decisions; however, this calls for a second level of decision making policy in choosing the algorithm best suited for the scenario, and the conditions surrounding the scenario.<!--how is this different from process management?--> ===Reliability=== This technique allowed for multiple program counters to each associate with one of 32 possible sequences of program code. These explicitly prioritized sequences could be interleaved and executed concurrently, affecting not only the computation in process, but also the control flow of sequences and switching of devices as well. Much discussion ensues related to the complexity and sophistication in the sequence capabilities of devices. Distributed OS can provide the necessary resources and services to achieve high levels of ''reliability'', or the ability to prevent and/or recover from errors. [[Fault (technology)\|Faults]] are physical or logical defects that can cause errors in the system. For a system to be reliable, it must somehow overcome the adverse effects of faults. The primary methods for dealing with faults include ''fault avoidance'', [[Fault-tolerant design\|fault tolerance]], and ''fault detection and recovery''. Fault avoidance covers proactive measures taken to minimize the occurrence of faults. These proactive measures can be in the form of ''[[transaction processing\|transactions]]'', [[Replication (computer science)\|replication]] and [[Replication (computer science)#Primary-backup and multi-primary replication\|backups]]. Fault tolerance is the ability of a system to continue operation in the presence of a fault. In the event, the system should detect and recover full functionality. In any event, any actions taken should make every effort to preserve the ''single system image''. Similar to the previous system, the TX-2 discussion has a distinct decentralized theme until it is revealed that efficiencies in system operation are gained when separate programmed devices are operated simultaneously. It is also stated that the full power of the central unit can be utilized by any device; and it may be used for as long as the device's situation requires. In this, we see the TX-2 as another example of a system exhibiting distributed control, its central unit not having dedicated control. ===Availability=== ~~==== Memory access abstraction ====~~ [[Availability]] is the fraction of time during which the system can respond to requests. '''Intercommunicating Cells, Basis for a Distributed Logic Computer'''<ref>Lee, C. Y. 1962. Intercommunicating cells, basis for a distributed logic computer. In Proceedings of the December 4-6, 1962, Fall Joint Computer Conference (Philadelphia, Pennsylvania, December 04 - 06, 1962). AFIPS '62 (Fall).</ref> (1962) ===Performance=== One early memory access paradigm was Intercommunicating Cells, where a cell is composed of a collection of [[Computer data storage\|memory]] elements. A memory element was basically a electronic [[flip-flop]] or [[relay]], capable of two possible values. Within a cell there are two types of elements, symbol and cell elements. Each cell structure stores [[data]] in a [[String (computer science)\|string]] of symbols, consisting of a [[Identifier\|name]] and a set of associated [[parameter]]s. Consequently, a system's information is linked through various associations of cells. Many [[Benchmark (computing)\|benchmark metrics]] quantify [[Computer performance\|performance]]; throughput, response time, job completions per unit time, system utilization, etc. With respect to a distributed OS, performance most often distills to a balance between [[Parallel computing\|process parallelism]] and IPC.{{Citation needed\|date=January 2012}} Managing the [[Granularity#In computing\|task granularity]] of parallelism in a sensible relation to the messages required for support is extremely effective.{{Citation needed\|date=January 2012}} Also, identifying when it is more beneficial to [[Process migration\|migrate a process]] to its data, rather than copy the data, is effective as well.{{Citation needed\|date=January 2012}} ===Synchronization=== Intercommunicating Cells fundamentally break from tradition in that it has no [[Program counter\|counter]]s or any concept of [[Memory address\|addressing memory]]. The theory contends that addressing is a wasteful and non-valuable [[indirection\|level of indirection]]. Information is accessed in two ways, direct and cross-retrieval. Direct retrieval looks to a name and returns a parameter set. Cross-retrieval [[Projection (mathematics)\|projects]] through parameter sets and returns a set of names containing the given [[subset]] of parameters. This would be similar to a modified [[hash table]] [[data structure]] that would allow for multiple [[Value (mathematics)\|values]] (parameters) for each [[Unique key\|key]] (name). Cooperating [[Concurrent computing\|concurrent processes]] have an inherent need for [[Synchronization (computer science)\|synchronization]], which ensures that changes happen in a correct and predictable fashion. Three basic situations that define the scope of this need: :* one or more processes must synchronize at a given point for one or more other processes to continue, ~~{\| style="width=100%; background:#F8FFFB;"~~ :* one or more processes must wait for an asynchronous condition in order to continue, ~~\|- valign="top"~~ :* or a process must establish exclusive access to a shared resource. ~~\| colspan="3" \|Cellular memory would have many advantages:~~ ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| A major portion of a system's [[Boolean logic\|logic]] is distributed within the associations of information stored in the cells,~~ ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| This flow of information association is somewhat guided by the act of storing and retrieving,~~ ~~\|- valign="top"~~ \| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| The time required for storage and [[Information retrieval\|retrieval]] is mostly [[constant time\|constant]] and completely unrelated to the size and fill-factor of the memory ~~\|- valign="top"~~ ~~\| width="20px" \| \|\| width="10px" \| [[file:Writing_bullet.svg\|top]] \|\| Cells are logically indistinguishable, making them both flexible to use and relatively simple to extend in size~~ \|} Improper synchronization can lead to multiple failure modes including loss of [[ACID\|atomicity, consistency, isolation and durability]], [[Deadlock (computer science)\|deadlock]], [[livelock]] and loss of [[serializability]].{{Citation needed\|date=January 2012}} This early research into alternative memory describes a [[Computer configuration\|configuration]] ideal for the distributed operating system. The constant-time projection through memory for storing and retrieval would be inherently [[Atomic operation\|atomic]] and [[Mutual exclusion\|exclusive]]. The cellular memory's intrinsic distributed characteristics would be an invaluable benefit; however, the impact on the [[User interface\|user]], [[hardware]]/[[Peripheral\|device]], or [[Application programming interface]]s is uncertain. It is distinctly obvious that these early researchers had a distributed system concept in mind, as they state: ===Flexibility=== {{quote\|We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.\|Chung-Yeol (C. Y.) Lee\|''Intercommunicating Cells, Basis for a Distributed Logic Computer''}} [[Flexibility (engineering)\|Flexibility]] in a distributed operating system is enhanced through the modular characteristics of the distributed OS, and by providing a richer set of higher-level services. The completeness and quality of the kernel/microkernel simplifies implementation of such services, and potentially enables service providers greater choice of providers for such services.{{Citation needed\|date=April 2012}} ==Research== ~~==== Component abstraction ====~~ '''HYDRA:The Kernel of a Multiprocessor Operating System'''<ref>Wulf, W., Cohen, E., Corwin, W., Jones, A., Levin, R., Pierson, C., and Pollack, F. 1974. HYDRA: the kernel of a multiprocessor operating system. Commun. ACM 17, 6 (Jun. 1974), 337-345.</ref> (1974) ~~<br />~~ <font color="red">''The design philosophy of HYDRA ... suggest that, at the heart of the system, one should build a collection of facilities of "universal applicability" and "absolute reliability" -- a set of mechanisms from which an arbitrary set of operating system facilities and policies can be conveniently, flexibly, efficiently, and reliably constructed.'' ~~<br />~~ ''Defining a kernel with all the attributes given above is difficult, and perhaps impractical... It is, nevertheless, the approach taken in the HYDRA system. Although we make no claim either that the set of facilities provided by the HYDRA kernel ... we do believe the set provides primitives which are both necessary and adequate for the construction of a large and interesting class of operating environments. It is our view that the set of functions provided by HYDRA will enable the user of C.mmp to create his own operating environment without being confined to predetermined command and file systems, execution scenarios, resource allocation policies, etc.''</font> ===Replicated model extended to a component object model=== ~~==== Initial composition ====~~ ~~'''The~~{{pad\|2em}}Architectural ~~National~~Design ~~Software~~of ~~Works: A~~E1 Distributed ~~Processing~~Operating System~~'''~~<ref>~~Millstein~~L.B. Ryzhyk, RA. EY. ~~1977~~Burtsev. ~~The~~Architectural ~~National~~design ~~Software~~of ~~Works: A~~dE1 distributed ~~processing~~operating system. InSystem ~~Proceedings~~Research ofand ~~the~~Information ~~1977~~Technologies ~~Annual~~international ~~Conference~~scientific ~~ACM~~and ~~'77.~~technical ~~ACM~~journal, ~~New~~October ~~York~~2004, NYKiev, ~~44-52~~Ukraine.</ref> ~~(1975)~~ <br />{{pad\|2em}}The Cronus distributed operating system<ref>Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08–10, 1986). EW 2. ACM, New York, NY, 1-3.</ref> <br />{{pad\|2em}}Design and development of MINIX distributed operating system<ref>Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.</ref> ===Complexity/Trust exposure through accepted responsibility=== <font color="red">''The National Software Works (NSW) is a significant new step in the development of distributed processing systems and computer networks. NSW is an ambitious project to link a set of geographically distributed and diverse hosts with an operating system which appears as a single entity to a prospective user.''</font> :Scale and performance in the Denali isolation kernel.<ref>Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation</ref> ===Multi/Many-core focused systems=== ~~==== Complete instantiation ====~~ ~~'''~~:The ~~Rosco~~multikernel: ~~Distributed~~a ~~Operating~~new ~~System'''~~OS architecture for scalable multicore systems.<ref>~~Solomon~~Baumann, MA., HBarham, P., ~~and~~Dagand, ~~Finkel~~P., Harris, T., Isaacs, R., APeter, S., Roscoe, ~~1979~~T., ~~The~~Schüpbach, ~~Roscoe~~A., ~~distributed~~and ~~operating~~Singhania, A. ~~system~~2009. In Proceedings of the ~~Seventh~~ ACM SIGOPS 22nd Symposium on Operating Systems Principles (~~Pacific~~Big ~~Grove~~Sky, ~~California~~Montana, ~~United States~~USA, ~~December 10 -~~October 1211–14, ~~1979~~2009). SOSP '7909.</ref> ~~(1979)~~ :Corey: an Operating System for Many Cores.<ref>S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.</ref> :Almos: Advanced Locality Management Operating System for cc-NUMA Many-Cores.<ref>Almaless, G. and Wajsbürt, F. 2011. In Proceedings of the 5th national seminar of GDR SoC-SIP, Lyon, France, 2011.</ref> ===Distributed processing over extremes in heterogeneity=== <font color="red">''Roscoe is an operating system implemented at the University of Wisconsin that allows a network of microcomputers to cooperate to provide a general-purpose computing facility. The goal of the Roscoe network is to provide a general-purpose computation resource in which individual resources such as files and processors are shared among processes and control is distributed in a non-hierarchical fashion. All processors are identical. Similarly, all processors run the same operating system kernel. However, they may differ in the peripheral units connected to them. No memory is shared between processors. All communication involves messages explicitly passed between physically connected processors. No assumptions are made about the topology of interconnection.'' :Helios: heterogeneous multiprocessing with satellite kernels.<ref>Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11–14, 2009). SOSP '09.</ref> ''The decision not to use logical or physical sharing of memory for communication is influenced both by the constraints of currently available hardware and by our perception of cost bottlenecks likely to arise as the number of processors increases. ''</font> ===Effective and stable in multiple levels of complexity=== ~~=== Foundational Work ===~~ :Tessellation: Space-Time Partitioning in a Manycore Client OS.<ref>Rose Liu, Kevin Klues, and Sarah Bird, University of California at Berkeley; Steven Hofmeyr, Lawrence Berkeley National Laboratory; [[Krste Asanović]] and John Kubiatowicz, University of California at Berkeley. HotPar09.</ref> ~~==== Coherent memory abstraction ====~~ {{pad\|2em}}'''Algorithms for scalable synchronization on shared-memory multiprocessors'''<ref>Mellor-Crummey, J. M. and Scott, M. L. 1991. Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (Feb. 1991), 21-65.</ref> <br />{{pad\|2em}}'''A N algorithm for mutual exclusion in decentralized systems'''<ref>Maekawa, M. 1985. A N algorithm for mutual exclusion in decentralized systems. ACM Trans. Comput. Syst. 3, 2 (May. 1985), 145-159.</ref> ==See also== ~~==== File System abstraction ====~~ * {{annotated link\|Distributed computing}} {{pad\|2em}}'''Measurements of a distributed file system'''<ref>Baker, M. G., Hartman, J. H., Kupfer, M. D., Shirriff, K. W., and Ousterhout, J. K. 1991. Measurements of a distributed file system. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, October 13 - 16, 1991). SOSP '91. ACM, New York, NY, 198-212.</ref> * {{annotated link\|HarmonyOS}} <br />{{pad\|2em}}'''Memory coherence in shared virtual memory systems'''<ref>Li, K. and Hudak, P. 1989. Memory coherence in shared virtual memory systems. ACM Trans. Comput. Syst. 7, 4 (Nov. 1989), 321-359.</ref> * {{annotated link\|OpenHarmony}} * {{annotated link\|BlueOS}} * {{annotated link\|Plan 9 from Bell Labs}} * {{annotated link\|Inferno (operating system)\|Inferno}} * {{annotated link\|MINIX}} * {{annotated link\|Single system image}} (SSI) * {{annotated link\|Computer systems architecture}} * {{annotated link\|Multikernel}} * {{annotated link\|Operating System Projects}} * {{annotated link\|Edsger W. Dijkstra Prize in Distributed Computing}} * {{annotated link\|List of distributed computing conferences}} * {{annotated link\|List of volunteer computing projects}} ==References== ~~==== Transaction abstraction ====~~ {{Reflist}} ~~{{pad\|2em}}''Transactions''~~ <br />{{pad\|4em}}'''Sagas'''<ref>Garcia-Molina, H. and Salem, K. 1987. Sagas. In Proceedings of the 1987 ACM SIGMOD international Conference on Management of Data (San Francisco, California, United States, May 27 - 29, 1987). U. Dayal, Ed. SIGMOD '87. ACM, New York, NY, 249-259.</ref> ==Further reading== ~~{{pad\|2em}}''Transactional Memory''~~ * {{cite book\|last1=Chow\|first1=Randy\|author2=Theodore Johnson\|title=Distributed Operating Systems and Algorithms\|url=https://books.google.com/books?id=J4MZAQAAIAAJ\|year=1997\|publisher=Addison Wesley\|isbn=978-0-201-49838-7}} <br />{{pad\|4em}}'''Composable memory transactions'''<ref>Harris, T., Marlow, S., Peyton-Jones, S., and Herlihy, M. 2005. Composable memory transactions. In Proceedings of the Tenth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (Chicago, IL, USA, June 15 - 17, 2005). PPoPP '05. ACM, New York, NY, 48-60.</ref> * {{cite book\|last=Sinha\|first=Pradeep Kumar \|title=Distributed Operating Systems: Concepts and Design\|url=https://archive.org/details/distributedopera0000sinh\|url-access=registration\|year=1997\|publisher=IEEE Press\|isbn=978-0-7803-1119-0}} <br />{{pad\|4em}}'''Transactional memory: architectural support for lock-free data structures'''<ref>Herlihy, M. and Moss, J. E. 1993. Transactional memory: architectural support for lock-free data structures. In Proceedings of the 20th Annual international Symposium on Computer Architecture (San Diego, California, United States, May 16 - 19, 1993). ISCA '93. ACM, New York, NY, 289-300.</ref> * {{cite book\|last=Galli\|first=Doreen L.\|title=Distributed Operating Systems: Concepts and Practice\|url=https://archive.org/details/distributedopera00gall \|url-access=registration\|year=2000\|publisher=Prentice Hall\|isbn=978-0-13-079843-5}} <br />{{pad\|4em}}'''Software transactional memory for dynamic-sized data structures'''<ref>Herlihy, M., Luchangco, V., Moir, M., and Scherer, W. N. 2003. Software transactional memory for dynamic-sized data structures. In Proceedings of the Twenty-Second Annual Symposium on Principles of Distributed Computing (Boston, Massachusetts, July 13 - 16, 2003). PODC '03. ACM, New York, NY, 92-101.</ref> <br />{{pad\|4em}}'''Software transactional memory'''<ref>Shavit, N. and Touitou, D. 1995. Software transactional memory. In Proceedings of the Fourteenth Annual ACM Symposium on Principles of Distributed Computing (Ottowa, Ontario, Canada, August 20 - 23, 1995). PODC '95. ACM, New York, NY, 204-213.</ref> ==External links== ~~==== Persistence abstraction ====~~ {{Prone to spam\|date=May 2022}} {{pad\|2em}}'''OceanStore: an architecture for global-scale persistent storage'''<ref>Kubiatowicz, J., Bindel, D., Chen, Y., Czerwinski, S., Eaton, P., Geels, D., Gummadi, R., Rhea, S., Weatherspoon, H., Wells, C., and Zhao, B. 2000. OceanStore: an architecture for global-scale persistent storage. In Proceedings of the Ninth international Conference on Architectural Support For Programming Languages and Operating Systems (Cambridge, Massachusetts, United States). ASPLOS-IX. ACM, New York, NY, 190-201.</ref> <!-- {{No more links}} Please be cautious adding more external links. ~~==== Coordinator abstraction ====~~ {{pad\|2em}}'''Weighted voting for replicated data'''<ref>Gifford, D. K. 1979. Weighted voting for replicated data. In Proceedings of the Seventh ACM Symposium on Operating Systems Principles (Pacific Grove, California, United States, December 10 - 12, 1979). SOSP '79. ACM, New York, NY, 150-162</ref> <br />{{pad\|2em}}'''Consensus in the presence of partial synchrony'''<ref>Dwork, C., Lynch, N., and Stockmeyer, L. 1988. Consensus in the presence of partial synchrony. J. ACM 35, 2 (Apr. 1988), 288-323.</ref> Wikipedia is not a collection of links and should not be used for advertising. ~~==== Reliability abstraction ====~~ ~~{{pad\|2em}}''Sanity checks''~~ ~~<br />{{pad\|4em}}'''The Byzantine Generals Problem'''<ref>Lamport, L., Shostak, R., and Pease, M. 1982. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst. 4, 3 (Jul. 1982), 382-401.</ref>~~ <br />{{pad\|4em}}'''Fail-stop processors: an approach to designing fault-tolerant computing systems'''<ref>Schlichting, R. D. and Schneider, F. B. 1983. Fail-stop processors: an approach to designing fault-tolerant computing systems. ACM Trans. Comput. Syst. 1, 3 (Aug. 1983), 222-238.</ref> Excessive or inappropriate links will be removed. ~~{{pad\|2em}}''Recoverability''~~ <br />{{pad\|4em}}'''Distributed snapshots: determining global states of distributed systems'''<ref>Chandy, K. M. and Lamport, L. 1985. Distributed snapshots: determining global states of distributed systems. ACM Trans. Comput. Syst. 3, 1 (Feb. 1985), 63-75.</ref> ~~<br />{{pad\|4em}}'''Optimistic recovery in distributed systems'''<ref>Strom, R. and Yemini, S. 1985. Optimistic recovery in distributed systems. ACM Trans. Comput. Syst. 3, 3 </ref>~~ See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details. ~~=== Current Research ===~~ If there are already suitable links, propose additions or replacements on ~~==== replicated model extended to a component object model ====~~ the article's talk page. {{pad\|2em}}Architectural Design of E1 Distributed Operating System<ref>L.B. Ryzhyk, A.Y. Burtsev. Architectural design of E1 distributed operating system. System Research and Information Technologies international scientific and technical journal, October 2004, Kiev, Ukraine.</Ref> <br />{{pad\|2em}}The Cronus distributed operating system<ref>Vinter, S. T. and Schantz, R. E. 1986. The Cronus distributed operating system. In Proceedings of the 2nd Workshop on Making Distributed Systems Work (Amsterdam, Netherlands, September 08 - 10, 1986). EW 2. ACM, New York, NY, 1-3.</ref> <br />{{pad\|2em}}Fine-grained mobility in the emerald system<ref>Jul, E., Levy, H., Hutchinson, N., and Black, A. 1987. Fine-grained mobility in the emerald system. In Proceedings of the Eleventh ACM Symposium on Operating Systems Principles (Austin, Texas, United States, November 08 - 11, 1987). SOSP '87. ACM, New York, NY, 105-106.</ref> <br />{{pad\|2em}}Design and development of MINIX distributed operating system<ref>Ramesh, K. S. 1988. Design and development of MINIX distributed operating system. In Proceedings of the 1988 ACM Sixteenth Annual Conference on Computer Science (Atlanta, Georgia, United States). CSC '88. ACM, New York, NY, 685.</ref> ~~=== Future Directions ===~~ ~~==== Systems able to provide low-level complexity exposure, in proportion to trust and accepted responsibility ====~~ {{pad\|2em}}Application performance and flexibility on exokernel systems.<ref>M. Frans Kaashoek, Dawson R. Engler, Gregory R. Ganger, Héctor M. Briceño, Russell Hunt, David Mazières, Thomas Pinckney, Robert Grimm, John Jannotti, and Kenneth Mackenzie. In the Proceedings of the 16th ACM Symposium on Operating Systems Principles (SOSP '97), Saint-Malô, France, October 1997.</ref> <br />{{pad\|2em}}Scale and performance in the Denali isolation kernel.<ref>Whitaker, A., Shaw, M., and Gribble, S. D. 2002. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation</ref> ~~==== Infrastructures focused on multi-processor/core processing ====~~ {{pad\|2em}}The multikernel: a new OS architecture for scalable multicore systems.<ref>Baumann, A., Barham, P., Dagand, P., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schüpbach, A., and Singhania, A. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.</ref> <br />{{pad\|2em}}Corey: an Operating System for Many Cores.<ref>S. Boyd-Wickizer, H. Chen, R. Chen, Y. Mao, F. Kashoek, R. Morris, A. Pesterev, L. Stein, M. Wu, Y. Dai, Y. Zhang, and Z. Zhang. Proceedings of the 2008 Symposium on Operating Systems Design and Implementation (OSDI), December 2008.</ref> ~~==== Systems extending a consistent and stable impression of distributed processing over extremes in heterogeneity ====~~ {{pad\|2em}}Helios: heterogeneous multiprocessing with satellite kernels.<ref>Nightingale, E. B., Hodson, O., McIlroy, R., Hawblitzel, C., and Hunt, G. 2009. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (Big Sky, Montana, USA, October 11 - 14, 2009). SOSP '09.</ref> ~~==== Systems able to provide effective, stable, and beneficial views of vastly increased complexity on multiple levels ====~~ ~~{{pad\|2em}}Tesselation~~ ~~== See Also ==~~ * Coming Soon... ~~== References ==~~ ~~<!--- See http://en.wikipedia.org/wiki/Wikipedia:Footnotes on how to create references using <ref></ref> tags which will then appear here automatically -->~~ ~~{{Reflist}}~~ --> ~~== Further Reading ==~~ * Coming Soon... {{Distributed operating systems}} ~~== External links ==~~ {{Operating system}} * Coming Soon... {{Authority control}} {{DEFAULTSORT:Distributed Operating System}} ~~<!--- Categories --->~~ [[Category:~~Articles~~Computer ~~created via the Article Wizard~~networks]] [[Category:Distributed operating systems\| ]] [[Category:History of software]] [[Category:Operating systems]]