Distributed operating system: Difference between revisions

Content deleted Content added
External links: link not working
See also: fixed
Tag: possibly inaccurate edit summary
 
(46 intermediate revisions by 35 users not shown)
Line 1:
{{shortShort description|Operating system designed to operate on multiple systems over a network computer}}
A '''distributed operating system''' is system software over a collection of independent software, [[Computer network|networked]], [[Inter-process communication|communicating]], and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs.<ref name="Tanenbaum1993">{{cite journal |last=Tanenbaum |first=Andrew S |date=September 1993 |title=Distributed operating systems anno 1992. What have we learned so far? |journal=Distributed Systems Engineering |volume=1 |issue=1 |pages=3–10 |doi=10.1088/0967-1846/1/1/001|bibcode=1993DSE.....1....3T |doi-access=free }}</ref> Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners.<ref name="Nutt1992">{{cite book|last=Nutt|first=Gary J.|title=Centralized and Distributed Operating Systems|url=https://archive.org/details/centralizeddistr0000nutt |url-access=registration|year=1992|publisher=Prentice Hall|isbn=978-0-13-122326-4}}</ref> The first is a ubiquitous minimal [[Kernelkernel (computingoperating system)|kernel]], or [[microkernel]], that directly controls that node's hardware. Second is a higher-level collection of ''system management components'' that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.<ref name="Gościński1991">{{cite book|last=Gościński|first=Andrzej|title=Distributed Operating Systems: The Logical Design|url=https://books.google.com/books?id=ZnYhAQAAIAAJ|year=1991|publisher=Addison-Wesley Pub. Co.|isbn=978-0-201-41704-3}}</ref>
A '''distributed operating system''' is a software over a collection of independent, [[Computer network|networked]], [[Inter-process communication|communicating]], and physically separate computational nodes.
They handle jobs which are serviced by multiple CPUs.<ref name="Tanenbaum1993">{{cite journal |last=Tanenbaum |first=Andrew S |date=September 1993 |title=Distributed operating systems anno 1992. What have we learned so far? |journal=Distributed Systems Engineering |volume=1 |issue=1 |pages=3–10 |doi=10.1088/0967-1846/1/1/001|doi-access=free }}</ref> Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners.<ref name="Nutt1992">{{cite book|last=Nutt|first=Gary J.|title=Centralized and Distributed Operating Systems|url=https://archive.org/details/centralizeddistr0000nutt |url-access=registration|year=1992|publisher=Prentice Hall|isbn=978-0-13-122326-4}}</ref> The first is a ubiquitous minimal [[Kernel (computing)|kernel]], or [[microkernel]], that directly controls that node's hardware. Second is a higher-level collection of ''system management components'' that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.<ref name="Gościński1991">{{cite book|last=Gościński|first=Andrzej|title=Distributed Operating Systems: The Logical Design|url=https://books.google.com/books?id=ZnYhAQAAIAAJ|year=1991|publisher=Addison-Wesley Pub. Co.|isbn=978-0-201-41704-3}}</ref>
 
The microkernel and the management components collection work together. They support the system's goal of integrating multiple resources and processing functionality into an efficient and stable system.<ref name="Fortier1986">{{cite book|last=Fortier|first=Paul J.|title=Design of Distributed Operating Systems: Concepts and Technology|url=https://books.google.com/books?id=F7QmAAAAMAAJ|year=1986|publisher=Intertext Publications|isbn=9780070216211}}</ref> This seamless integration of individual nodes into a global system is referred to as ''transparency'', or ''[[single system image]]''; describing the illusion provided to users of the global system's appearance as a single computational entity.<!-- is transparency required for membership in the "dos" group?-->
Line 15 ⟶ 14:
 
===The kernel===
{{Expert needed|Computing|reason=See questions asked as comments in the "kernel" section|date=January 2012}}
At each [[Locale (computer hardware)|locale]] (typically a node), the kernel provides a minimally complete set of node-level utilities necessary for operating a node's underlying hardware and resources. These mechanisms include allocation, management, and disposition of a node's resources, processes, communication, and [[input/output]] management support functions.<ref name="Hansen2001">{{cite book|editor=Hansen, Per Brinch|title=Classic Operating Systems: From Batch Processing to Distributed Systems|url=https://books.google.com/books?id=-PDPBvIPYBkC|year=2001|publisher=Springer|isbn=978-0-387-95113-3}}</ref> Within the kernel, the communications sub-system is of foremost importance for a distributed OS.<ref name="Gościński1991"/>
 
In a distributed OS, the kernel often supports a minimal set of functions, including low-level [[address space]] management, [[thread (computing)|thread]] management, and [[inter-process communication]] (IPC). A kernel of this design is referred to as a ''[[microkernel'']].<ref>Using LOTOS for specifying the CHORUS distributed operating system kernel Pecheur, C. 1992. Using LOTOS for specifying the CHORUS distributed operating system kernel. Comput. Commun. 15, 2 (Mar. 1992), 93-102.</ref><ref>COOL: kernel support for object-oriented environments Habert, S. and Mosseri, L. 1990. COOL: kernel support for object-oriented environments. In Proceedings of the European Conference on Object-Oriented Programming on Object-Oriented Programming Systems, Languages, and Applications (Ottawa, Canada). OOPSLA/ECOOP '90. ACM, New York, NY, 269-275.</ref> Its modular nature enhances reliability and security, essential features for a distributed OS.<ref name="Sinha1997">{{cite book|last=Sinha|first=Pradeep Kumar |title=Distributed Operating Systems: Concepts and Design|url=https://archive.org/details/distributedopera0000sinh|url-access=registration|year=1997|publisher=IEEE Press|isbn=978-0-7803-1119-0}}</ref> It is common for a kernel to be identically replicated over all nodes in a system and therefore that the nodes in a system use similar hardware.<!-- is that true? don't most implementations have a variety of hardware and os versions and even distributions?--><ref name="Galli2000">{{cite book|last=Galli|first=Doreen L.|title=Distributed Operating Systems: Concepts and Practice|url=https://archive.org/details/distributedopera00gall |url-access=registration|year=2000|publisher=Prentice Hall|isbn=978-0-13-079843-5}}</ref> The combination of minimal design and ubiquitous node coverage enhances the global system's extensibility, and the ability to dynamically introduce new nodes or services.<ref name="Chow1997">{{cite book|last1=Chow|first1=Randy|author2=Theodore Johnson|title=Distributed Operating Systems and Algorithms|url=https://books.google.com/books?id=J4MZAQAAIAAJ|year=1997|publisher=Addison Wesley|isbn=978-0-201-49838-7}}</ref>
 
[[Image:System Management Components.PNG|thumbnail|right|175px|alt=General overview of system management components that reside above the microkernel.|System management components overview]]
Line 25 ⟶ 23:
System management components are software processes that define the node's ''policies''. These components are the part of the OS outside the kernel. These components provide higher-level communication, process and resource management, reliability, performance and security. The components match the functions of a single-entity system, adding the transparency required in a distributed environment.<ref name="Gościński1991"/>
 
The distributed nature of the OS requires additional services to support a node's responsibilities to the global system. In addition, the system management components accept the "defensive" responsibilities of reliability, availability, and persistence. These responsibilities can conflict with each other. A consistent approach, balanced perspective, and a deep understanding of the overall system can assist in identifying [[diminishing returns]].<!--this sentence is rhetoric. say what is meant. give an example.--> Separation of policy and mechanism mitigates such conflicts.<ref name="Chow1997">{{cite book|last1=Chow|first1=Randy|author2=Theodore Johnson|title=Distributed Operating Systems and Algorithms|url=https://books.google.com/books?id=J4MZAQAAIAAJ|year=1997|publisher=Addison Wesley|isbn=978-0-201-49838-7}}</ref>
 
===Working together as an operating system===
Line 38 ⟶ 36:
 
==History==
Research and experimentation efforts began in earnest in the 1970s and continued through the 1990s, with focused interest peaking in the late 1980s. A number of distributed operating systems were introduced during this period; however, very few of these implementations achieved even modest commercial success.
 
Fundamental and pioneering implementations of primitive distributed operating system component concepts date to the early 1950s.<ref name=dyseac>{{cite journal |last1=Leiner |first1=Alan L. |title=System Specifications for the DYSEAC |journal=Journal of the ACM |date=April 1954 |volume=1 |issue=2 |pages=57–81 |doi=10.1145/320772.320773 |s2cid=15381094 |doi-access=free |via=ACM Digital Library}}</ref><ref name=lincoln_tx2>{{cite conference |title=The Lincoln TX-2 Input-Output System |first=James W. |last=Forgie |date=February 26–28, 1957 |conference=Western Joint Computer Conference: Techniques for Reliability |publisher=Association for Computing Machinery |___location=Los Angeles, California |pages=156–160 |isbn=9781450378611 |doi=10.1145/1455567.1455594 |doi-access=free |via=ACM Digital Library}}</ref><ref name=intercomm_cells>{{cite conference |author=C. Y. Lee |title=Intercommunicating cells, basis for a distributed logic computer |date=December 4–6, 1962 |conference=Fall Joint Computer Conference |publisher=Association for Computing Machinery |via=ACM Digital Library |___location=Philadelphia, Pennsylvania |pages=130–136 |doi=10.1145/1461518.1461531 |doi-access=free}}</ref> Some of these individual steps were not focused directly on distributed computing, and at the time, many may not have realized their important impact. These pioneering efforts laid important groundwork, and inspired continued research in areas related to distributed computing.<ref name="Dreyfus_1958_Gamma60">{{citation |title=System design of the Gamma 60 |author-first=Phillippe |author-last=Dreyfus |author-link=Philippe Dreyfus |work=Proceedings of the May 6–8, 1958, [[Western Joint Computer Conference]]: Contrasts in Computers |___location=Los Angeles |date=1958-05-08 |orig-year=1958-05-06 |id=IRE-ACM-AIEE '58 (Western) |publication-place=ACM, New York, NY, USA |pages=130–133 |url=https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf |access-date=2017-04-03 |url-status=live |archive-url=https://web.archive.org/web/20170403224547/https://www.computer.org/csdl/proceedings/afips/1958/5052/00/50520130.pdf |archive-date=2017-04-03}}</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1958. Organizing a network of computers to meet deadlines. In Papers and Discussions Presented At the December 9–13, 1957, Eastern Joint Computer Conference: Computers with Deadlines To Meet (Washington, D.C., December 09–13, 1957). IRE-ACM-AIEE '57</ref><ref>Leiner, A. L., Smith, J. L., Notz, W. A., and Weinberger, A. 1958. PILOT, the NBS multicomputer system. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 71-75.</ref><ref>Bauer, W. F. 1958. Computer design from the programmer's viewpoint. In Papers and Discussions Presented At the December 3–5, 1958, Eastern Joint Computer Conference: Modern Computers: Objectives, Designs, Applications (Philadelphia, Pennsylvania, December 03–05, 1958). AIEE-ACM-IRE '58 (Eastern). ACM, New York, NY, 46-51.</ref><ref>Leiner, A. L., Notz, W. A., Smith, J. L., and Weinberger, A. 1959. PILOT—A New Multiple Computer System. J. ACM 6, 3 (Jul. 1959), 313-335.</ref><ref>Estrin, G. 1960. [https://dl.acm.org/doi/abs/10.1145/1460361.1460365 Organization of computer systems: the fixed plus variable structure computer]. In Papers Presented At the May 3–5, 1960, Western Joint IRE-AIEE-ACM Computer Conference (San Francisco, California, May 03–05, 1960). IRE-AIEE-ACM '60 (Western). ACM, New York, NY, 33-40.</ref>
 
In the mid-1970s, research produced important advances in distributed computing. These breakthroughs provided a solid, stable foundation for efforts that continued through the 1990s.
 
The accelerating proliferation of [[Multiprocessing|multi-processor]] and [[multi-core processor]] systems research led to a resurgence of the distributed OS concept.
 
===1950s===
 
===The DYSEAC===
One of the first efforts was the [[DYSEAC]], a general-purpose [[Synchronization (computer science)|synchronous]] computer. In one of the earliest publications of the [[Association for Computing Machinery]], in April 1954, a researcher at the [[National Bureau of Standards]]{{snd}} now the National [[nist|Institute of Standards and Technology]] ([[nist|NIST]]){{snd}} presented a detailed specification of the DYSEAC. The introduction focused upon the requirements of the intended applications, including flexible communications, but also mentioned other computers:
 
{{quoteblockquote|Finally, the external devices could even include other full-scale computers employing the same digital language as the DYSEAC. For example, the SEAC or other computers similar to it could be harnessed to the DYSEAC and by use of coordinated programs could be made to work together in mutual cooperation on a common task… Consequently[,] the computer can be used to coordinate the diverse activities of all the external devices into an effective ensemble operation.|ALAN L. LEINER|''System Specifications for the DYSEAC''}}
 
The specification discussed the architecture of multi-computer systems, preferring peer-to-peer rather than master-slave.
{{quoteblockquote|Each member of such an interconnected group of separate computers is free at any time to initiate and dispatch special control orders to any of its partners in the system. As a consequence, the supervisory control over the common task may initially be loosely distributed throughout the system and then temporarily concentrated in one computer, or even passed rapidly from one machine to the other as the need arises. …the various interruption facilities which have been described are based on mutual cooperation between the computer and the external devices subsidiary to it, and do not reflect merely a simple master-slave relationship.|ALAN L. LEINER|''System Specifications for the DYSEAC''}}
 
This is one of the earliest examples of a computer with distributed control. The [[United States Department of the Army|Dept. of the Army]] reports<ref>Martin H. Weik, "A Third Survey of Domestic Electronic Digital Computing Systems," Ballistic Research Laboratories Report No. 1115, pg. 234-5, Aberdeen Proving Ground, Maryland, March 1961</ref> certified it reliable and that it passed all acceptance tests in April 1954. It was completed and delivered on time, in May 1954. This was a "[[portable computer]]", housed in a [[Tractor-trailer#Types of trailers|tractor-trailer]], with 2 attendant vehicles and [[Refrigerator truck|6 tons of refrigeration]] capacity.
Line 86 ⟶ 82:
This [[Computer configuration|configuration]] was ideal for distributed systems. The constant-time projection through memory for storing and retrieval was inherently [[Atomic operation|atomic]] and [[Mutual exclusion|exclusive]]. The cellular memory's intrinsic distributed characteristics<!-- are these intrinsically distributed or merely abstract?--> would be invaluable. The impact on the [[User interface|user]], [[Computer hardware|hardware]]/[[Peripheral|device]], or [[Application programming interface]]s was indirect. The authors were considering distributed systems, stating:
 
{{quoteblockquote|We wanted to present here the basic ideas of a distributed logic system with... the macroscopic concept of logical design, away from scanning, from searching, from addressing, and from counting, is equally important. We must, at all cost, free ourselves from the burdens of detailed local problems which only befit a machine low on the evolutionary scale of machines.|Chung-Yeol (C. Y.) Lee|''Intercommunicating Cells, Basis for a Distributed Logic Computer''}}
 
===Foundational work===
Line 125 ⟶ 121:
==Distributed computing models==
{{More citations needed section|date=January 2012}}
 
<ref>≤</ref>
 
===Three basic distributions===
Line 150 ⟶ 144:
* ''Location transparency'' – Location transparency comprises two distinct aspects of transparency, naming transparency and user mobility. Naming transparency requires that nothing in the physical or logical references to any system entity should expose any indication of the entity's ___location, or its local or remote relationship to the user or application. User mobility requires the consistent referencing of system entities, regardless of the system ___location from which the reference originates.<ref name="Sinha1997" />{{rp|20}}
* ''Access transparency'' – Local and remote system entities must remain indistinguishable when viewed through the user interface. The distributed operating system maintains this perception through the exposure of a single access mechanism for a system entity, regardless of that entity being local or remote to the user. Transparency dictates that any differences in methods of accessing any particular system entity—either local or remote—must be both invisible to, and undetectable by the user.<ref name="Gościński1991"/>{{rp|84}}<!--what is the difference between referencing and access?-->
* ''Migration transparency'' – Resources and activities migrate from one element to another controlled solely by the system and without user/application knowledge or action.<ref name="Galli2000">{{cite book|last=Galli|first=Doreen L.|title=Distributed Operating Systems: Concepts and Practice|url=https://archive.org/details/distributedopera00gall |url-access=registration|year=2000|publisher=Prentice Hall|isbn=978-0-13-079843-5}}</ref>{{rp|16}}
* ''Replication transparency'' – The process or fact that a resource has been duplicated on another element occurs under system control and without user/application knowledge or intervention.<ref name="Galli2000" />{{rp|16}}
* ''Concurrency transparency'' – Users/applications are unaware of and unaffected by the presence/activities of other users.<ref name="Galli2000" />{{rp|16}}
Line 190 ⟶ 184:
:* or a process must establish exclusive access to a shared resource.
 
Improper synchronization can lead to multiple failure modes including loss of [[ACID|atomicity, consistency, isolation and durability]], [[Deadlock (computer science)|deadlock]], [[livelock]] and loss of [[serializability]].{{Citation needed|date=January 2012}}
 
===Flexibility===
[[Flexibility (engineering)|Flexibility]] in a distributed operating system is enhanced through the modular and characteristics of the distributed OS, and by providing a richer set of higher-level services. The completeness and quality of the kernel/microkernel simplifies implementation of such services, and potentially enables service providers greater choice of providers for such services.{{Citation needed|date=April 2012}}
 
==Research==
Line 216 ⟶ 210:
===Effective and stable in multiple levels of complexity===
 
:Tessellation: Space-Time Partitioning in a Manycore Client OS.<ref>Rose Liu, Kevin Klues, and Sarah Bird, University of California at Berkeley; Steven Hofmeyr, Lawrence Berkeley National Laboratory; [[Krste Asanović]] and John Kubiatowicz, University of California at Berkeley. HotPar09.</ref>
 
==See also==
* [[{{annotated link|Distributed computing]]}}
* {{annotated link|HarmonyOS}}
* [[Plan 9 from Bell Labs]]
* {{annotated link|OpenHarmony}}
* [[Inferno (operating system)|Inferno]]
* {{annotated link|BlueOS}}
* [[MINIX]]
* [[{{annotated link|Plan 9 from Bell Labs]]}}
* [[Network operating system]] (NOS)
* [[{{annotated link|Inferno (operating system)|Inferno]]}}
* [[Single system image]] (SSI)
* {{annotated link|MINIX}}
* [[Operating system]]
* [[{{annotated link|Single system image]]}} (SSI)
* [[List of operating systems]]
* [[Comparison{{annotated of operatinglink|Computer systems]] architecture}}
* [[{{annotated link|Multikernel]]}}
* [[Computer systems architecture]]
* [[{{annotated link|Operating System Projects]]}}
* [[Multikernel]]
* [[{{annotated link|Edsger W. Dijkstra Prize in Distributed Computing]]}}
* [[List of important publications in concurrent, parallel, and distributed computing]]
* [[{{annotated link|List of distributed computing conferences]]}}
* [[Operating System Projects]]
* [[{{annotated link|List of distributedvolunteer computing projects]]}}
* [[Edsger W. Dijkstra Prize in Distributed Computing]]
* [[List of distributed computing conferences]]
* [[List of distributed computing projects]]
 
==References==
{{Reflist|30em}}
 
==Further reading==
* {{cite book|last1=Chow|first1=Randy|author2=Theodore Johnson|title=Distributed Operating Systems and Algorithms|url=https://books.google.com/books?id=J4MZAQAAIAAJ|year=1997|publisher=Addison Wesley|isbn=978-0-201-49838-7}}
* {{cite book|last=Sinha|first=Pradeep Kumar |title=Distributed Operating Systems: Concepts and Design|url=https://archive.org/details/distributedopera0000sinh|url-access=registration|year=1997|publisher=IEEE Press|isbn=978-0-7803-1119-0}}
* {{cite book|last=Galli|first=Doreen L.|title=Distributed Operating Systems: Concepts and Practice|url=https://archive.org/details/distributedopera00gall |url-access=registration|year=2000|publisher=Prentice Hall|isbn=978-0-13-079843-5}}
 
==External links==
{{Prone to spam|date=May 2022}}
* {{curlie|Computers/Computer_Science/Distributed_Computing/|Distributed computing}}
<!-- {{No more links}}
* {{curlie|Computers/Computer_Science/Distributed_Computing/Publications/|Distributed computing journals}}
 
* [http://pdos.csail.mit.edu/ MIT Parallel and Distributed Operating System Laboratory]
Please be cautious adding more external links.
* [http://parlab.eecs.berkeley.edu/ UCB parallel computing laboratory]
 
* [http://www.pdl.cmu.edu/index.shtml Parallel Data laboratory]
Wikipedia is not a collection of links and should not be used for advertising.
* [http://doc.cat-v.org/plan_9/4th_edition/papers/9 The distributed environment of Plan 9]
 
* [http://www.e1os.org/eng/index.html E1 Distributed Operating System]
Excessive or inappropriate links will be removed.
* [http://fsd-amoeba.sourceforge.net/ Amoeba DOS Source]
 
* [http://www.cs.vu.nl/pub/amoeba/ Amoeba home page]
See [[Wikipedia:External links]] and [[Wikipedia:Spam]] for details.
* [http://www.usenix.org/ USENIX: Advanced Computing association]
 
* [http://computer.howstuffworks.com/operating-system.htm How Stuff Works - Operating Systems]
If there are already suitable links, propose additions or replacements on
* [http://www.cs.rochester.edu/research/synchronization/pseudocode/ss.html Algorithms for scalable synchronization]
the article's talk page.
 
-->
 
* [[List of{{Distributed operating systems]]}}
{{Operating system}}
{{Authority control}}
 
{{DEFAULTSORT:Distributed Operating System}}