Reliability (computer networking): Difference between revisions

Content deleted Content added
std acronym def. other grammar.
See also: Efficiency (network science)
 
(32 intermediate revisions by 12 users not shown)
Line 1:
{{Short description|Protocol acknowledgement capability}}
{{Use American English|date=January 2020}}
In [[computer networking]], a '''reliable''' protocol is a [[communication protocol]] that notifies the sender whether or not the delivery of data to intended recipients was successful. [[Reliability engineering|Reliability]] is a synonym for '''assurance''', which is the term used by the [[ITU]] and [[ATM Forum]], and leads to '''[[fault-tolerant]] messaging'''.
 
In [[computer networking]], a '''reliable''' protocol is a [[communication protocol]] that notifies the sender whether or not the delivery of data to intended recipients was successful. Reliability is a synonym for '''assurance''', which is the term used by the [[ITU]] and [[ATM Forum]].
 
Reliable protocols typically incur more overhead than unreliable protocols, and as a result, function more slowly and with less scalability. This often is not an issue for [[unicast]] protocols, but it may become a problem for [[reliable multicast]] protocols.
 
[[Transmission Control Protocol]] (TCP), the main protocol used on the [[Internet]], is a reliable unicast protocol; it provides the abstraction of a [[reliable byte stream]] to applications. [[User Datagram Protocol|UDP]] is an unreliable protocol and is often used in [[computer games]], [[streaming media]] or in other situations where speed is an issue and some data loss may be tolerated because of the transitory nature of the data.
 
Often, a reliable unicast protocol is also [[connection- oriented]]. For example, TCP is connection- oriented, with the [[virtual circuit|virtual-circuit]] ID consisting of source and destination [[IP address]]es and port numbers. However, some unreliable protocols are connection- oriented, such as [[Asynchronous Transfer Mode]] and [[Frame Relay]]. In addition, some connectionless protocols, such as [[IEEE 802.11]], are reliable.
 
==History==
Building on the [[packet switching]] concepts proposed by [[Donald Davies]], the first [[communication protocol]] on the [[ARPANET]] was a reliable packet delivery procedure to connect its hosts via the [[BBN Report 1822|1822 interface]].<ref name="J. Gillies, R. Cailliau">{{cite book|last1=Gillies|first1=J.|url=https://books.google.co.ukcom/books?id=pIH-JijUNS0C&lpg=PA25&ots=MKZj0F7pJN&pg=PA25#v=onepage&q&f=false|title=How the Web was Born: The Story of the World Wide Web|last2=Cailliau|first2=R.|date=2000|publisher=[[Oxford University Press]]|ISBNisbn=0192862073|pages=23-2523–25}}</ref><ref name=":2">{{cite journal|last1=Roberts|first1=Dr. Lawrence G.|date=November 1978|title=The Evolution of Packet Switching|url=http://www.ismlab.usf.edu/dcom/Ch10_Roberts_EvolutionPacketSwitching_IEEE_1978.pdf|journal=IEEE Invited Paper|volume=|pages=|accessdateaccess-date=September 10, 2017|quote=In nearly all respects, Davies’ original proposal, developed in late 1965, was similar to the actual networks being built today.|via=}}</ref> A host computer simply arranged the data in the correct packet format, inserted the address of the destination host computer, and sent the message across the interface to its connected [[Interface Message Processor]] (IMP). Once the message was delivered to the destination host, an acknowledgment was delivered to the sending host. If the network could not deliver the message, the IMP would send an error message back to the sending host.
 
Meanwhile, the developers of [[CYCLADES]] and of [[ALOHAnet]] demonstrated that it was possible to build an effective computer network without providing reliable packet transmission. This lesson was later embraced by the designers of [[Ethernet]].
Line 17:
 
==Reliability properties==
A reliable service is one that notifies the user if delivery fails, while an ''unreliable'' one does not notify the user if delivery fails.{{cncitation needed|reason=See [[Talk:Reliability (computer networking)#Latest revision as of 16:16, 25 October 2017 by Kvng]]|date=November 2017}} For example, [[Internet Protocol]] (IP) provides an unreliable service. Together, [[Transmission Control Protocol]] (TCP) and IP provide a reliable service, whereas [[User Datagram Protocol]] (UDP) and IP provide an unreliable one.
 
In the context of distributed protocols, reliability properties specify the guarantees that the protocol provides with respect to the delivery of messages to the intended recipient(s).
Line 33:
Reliable messaging is the concept of [[message passing]] across an unreliable infrastructure whilst being able to make certain guarantees about the successful transmission of the messages.<ref>[http://www.w3.org/2001/03/WSWS-popa/paper40 W3C paper on reliable messaging]</ref> For example, that if the message is delivered, it is delivered at most once, or that all messages successfully delivered arrive in a particular order.
 
Reliable delivery can be contrasted with [[best-effort delivery]], where there is no guarantee that messages will be delivered quickly, in order, or at all.
 
==Implementations==
A reliable delivery protocol can be built on an unreliable protocol. An extremely common example is the layering of [[Transmission Control Protocol]] on the [[Internet Protocol]], a combination known as [[TCP/IP]].
 
Strong reliability properties are offered by [[group communication system]]s (GCSs) such as [[IS-IS]], [[Appia framework]], [[Spread (group communication system)|Spread]], [[JGroups]] or [[QuickSilver Scalable Multicast]]. The [[QuickSilver Properties Framework]] is a flexible platform that allows strong reliability properties to be expressed in a purely declarative manner, using a simple rule-based language, and automatically translated into a hierarchical protocol.
 
One protocol that implements reliable messaging is [[WS-ReliableMessaging]], which handles reliable delivery of [[SOAP]] messages.<ref>[http://download.boulder.ibm.com/ibmdl/pub/software/dw/specs/ws-rm/ws-reliablemessaging200502.pdf WS-ReliableMessaging specification (PDF)]</ref>
 
The [[Asynchronous Transfer Mode|ATM]] Service-Specific Coordination Function provides for transparent assured delivery with [[ATM Adaptation Layer 5|AAL5]].<ref>Young-ki Hwang, et al., ''Service Specific Coordination Function for Transparent Assured Delivery with AAL5 (SSCF-TADAS)'', Military Communications Conference Proceedings, 1999. MILCOM 1999, vol.2, pages 878–882. {{doi|10.1109/MILCOM.1999.821329}} </ref><ref name="ATMF-INTRO" >ATM Forum, The User Network Interface (UNI), v. 3.1, {{ISBN|0-13-393828-X}}, Prentice Hall PTR, 1995.</ref><ref name ="AAL-5 spec">ITU-T, ''B-ISDN ATM Adaptation Layer specification: Type 5 AAL'', Recommendation I.363.5, International Telecommunication Union, 1998.</ref>
 
[[IEEE 802.11]] attempts to provide reliable service for all traffic. The sending station will resend a frame if the sending station doesn'tdoes not receive an ACK frame within a predetermined period of time.<!--[[User:Kvng/RTH]]-->
 
==Real-time systems==
There is, however, a problem with the definition of reliability as "delivery or notification of failure" in [[real-time computing]]. In such systems, failure to deliver the real-time data will adversely affect the performance of the systems, and some systems, e.g. [[safety-critical]], [[Safety-involved systems|safety-involved]], and some secure [[mission-critical]] systems, must be [[formal methods|proved]] to perform at some specified minimum level. This, in turn, requires that there be a specified minimum reliability for the delivery of the critical data be met. HenceTherefore, in these cases, it is only the delivery that matters,; andnotification notifyingof the senderfailure doesto notdeliver negate ordoes ameliorate thisthe failure. ofIn the[[hard real-time system']]s, all data must be delivered by the deadline or it is considered a system failure. In [[transportfirm layerreal-time system]]s, tolate deliverdata is still valueless but the system can tolerate some amount of late or missing data.<ref name = "Schneider et al 2001">S., Schneider, G., Pardo-Castellote, M., Hamilton. "Can Ethernet Be Real Time?", Real-Time Innovations, Inc., 2001</ref><ref name = "Rubenstein et al 1998">Dan Rubenstein, Jim Kurose, Don Towsley, "Real-Time Reliable Multicast Using Proactive Forward Error Correction", NOSSDAV ’98</ref>
 
In [[Real-time computing#Criteria for real-time computing|hard and firm real-time systems]] the data has to be delivered within a deadline, i.e. data that is delivered late is valueless. In hard real-time systems, all data must be delivered within its deadline or it is considered a system failure. In firm real-time systems, late data is still valueless but the system can tolerate some amount of late or missing data.<ref name = "Schneider et al 2001">S., Schneider, G.,Pardo-Castellote, M., Hamilton. “Can Ethernet Be Real Time?”, Real-Time Innovations, Inc., 2001</ref><ref name = "Rubenstein et al 1998">Dan Rubenstein, Jim Kurose, Don Towsley, ”Real-Time Reliable Multicast Using Proactive Forward Error Correction”, NOSSDAV ’98</ref>
 
There are a number of protocols that are capable of meetingaddressing real-time requirements for reliable delivery and timeliness, at least for firm real-time systems (due to the inevitable and unavoidable losses from, e.g., the physical layer [[bit error rate]]s):
 
[[MIL-STD-1553B]] and [[STANAG 3910]] are well-known examples of such timely and reliable protocols for [[avionics#aircraftAircraft networks|avionic data buses]]. MIL-1553 uses a 1 &nbsp;Mbit/s shared media for the transmission of data and the control of these transmissions, and is widely used in federated military [[avionics]] systems (in which "Each [[System#Subsystem|system]] has its own computers performing its own functions".<ref name="Ekman_SAAB">{{citation |author=Mats Ekman, "|title=Avionic Architectures Trends and challenges", {{cite web |url=https://www.kth.se/polopoly_fs/1.146328!/Menu/general/column-content/attachment/3_Ekman_Saab.pdf |titlepublisher=ArchivedKTH copy |accessdate=2015-02-03 |url-status=dead |archiveurlarchive-url=https://web.archive.org/web/20150203164824/https://www.kth.se/polopoly_fs/1.146328!/Menu/general/column-content/attachment/3_Ekman_Saab.pdf |archivedatearchive-date=2015-02-03 |quote=Each system has its own computers performing its own functions}}</ref>). It uses a [[MIL-STD-1553B#Thebus Bus Controller|Bus Controller]]controller (BC) to command the connected [[MIL-STD-1553B#Theremote Remote Terminals|Remote Terminals]]terminals (RTs) to receive or transmit this data. The BC can, therefore, ensure that there will be no [[network congestion|congestion]], and transfers are always timely. The MIL-1553 protocol also allows for automatic retries that can still ensure timely delivery and increase the reliability above that of the physical layer. STANAG 3910, also known as EFABus in its use on the [[Eurofighter Typhoon]], is, in effect, a version of MIL-1553 augmented with a 20 &nbsp;Mbit/s shared media bus for data transfers, retaining the 1 &nbsp;Mbit/s shared media bus for control purposes.
 
The [[Asynchronous Transfer Mode]] (ATM), the [[Avionics Full-Duplex Switched Ethernet]] (AFDX), and [[Time Triggered Ethernet]] (TTEthernet) are examples of packet-switched networks protocols where the timeliness and reliability of data transfers can be assured by the network. AFDX and TTEthernet are also based on IEEE 802.3 Ethernet, though not entirely compatible with it.
 
ATM uses connection-oriented [[virtual channel]]s (VCs) which have fully deterministic paths through the network, and [[UPC and NPC|usage and network parameter control]] (UPC/NPC), which are implemented within the network, to limit the traffic on each VC separately. This allows the usage of the shared resources (switch buffers) in the network to be calculated from the parameters of the traffic to be carried in advance, i.e. at system design time. That they are implemented by the network means that these calculations remain valid even when other users of the network behave in unexpected ways, i.e. transmit more data than they are expected to. The calculated usages can then be compared with the capacities of these resources to show that, given the constraints on the routes and the bandwidths of these connections, the resource used for these transfers will never be over-subscribed. These transfers will therefore never be affected by congestion and there will be no losses due to this effect. Then, from the predicted maximum usages of the switch buffers, the maximum delay through the network can also be predicted. However, for the reliability and timeliness to be proved, and for the proofs to be tolerant of faults in and malicious actions by the equipment connected to the network, the calculations of these resource usages cannot be based on any parameters that are not actively enforced by the network, i.e. they cannot be based on what the sources of the traffic are expected to do or on statistical analyses of the traffic characteristics (see [[network calculus]]).<ref>{{cite articlejournal| first1=Y. J. | last1=Kim | first2=S. C. | last2=Chang | first3=C. K. | last3=Un | first4=B. C. | last4=Shin | title=UPC/NPC algorithm for guaranteed QoS in ATM networks | journal=Computer Communications | volume=19 | number=3 | date=March 1996 | pages=216–225 | publisher=[[Elsevier Science Publishers]] | ___location=Amsterdam, Thethe Netherlands | doi=10.1016/0140-3664(96)01063-8 }}</ref>
 
AFDX uses frequency ___domain bandwidth allocation and [[Traffic policing (communications)|traffic policing]] or bandwidth allocation, that allows the traffic on each virtual link (VL) to be limited so that the requirements for shared resources can be predicted and [[congestion preventionavoidance|congestion prevented]] so it can be proved not to affect the critical data.<ref>AFDX Tutorial, {{cite web |url=http://www.techsat.com/fileadmin/media/pdf/infokiosk/TechSAT_TUT-AFDX-EN.pdf |title=ArchivedAFDX® copy/ ARINC 664 Tutorial |accessdatepublisher=TechSAT |date=2008-08-29 |access-date=2015-02-03 |url-status=dead |archiveurlarchive-url=https://web.archive.org/web/20150618140031/http://www.techsat.com/fileadmin/media/pdf/infokiosk/TechSAT_TUT-AFDX-EN.pdf |archivedatearchive-date=2015-06-18 }}</ref> However, the techniques for predicting the resource requirements and proving that congestion is prevented are not part of the AFDX standard.
 
TTEthernet provides the lowest possible latency in transferring data across such athe network by using time-___domain control methods – each time triggered transfer is scheduled at a specific time so that contention for shared resources is entirely controlled and thus the possibility of congestion is eliminated. The switches in the network enforce this timing to provide tolerance of faults in, and malicious actions on the part of, the other connected equipment. However, "synchronized local clocks are the fundamental prerequisite for time-triggered communication".<ref>Wilfried Steiner and Bruno Dutertre, "[https://web.archive.org/web/20230125090223/http://www.csl.sri.com/users/bruno/publis/fmics2010.pdf ''SMT-Based Formal Verification of a ''TTEthernet'' Synchronization Function'']", S. Kowalewski and M. Roveri (Eds.), FMICS 2010, LNCS 6371, pp. 148–163, 2010.</ref> This is because the sources of critical data will have to have the same view of time as the switch, in order that they can transmit at the correct time and the switch will see this as correct. This also requires that the sequence with which a critical transfer is scheduled has to be predictable to both source and switch. This, in turn, will limit the transmission schedule to a highly deterministic one, e.g. the [[cyclic executive]].
 
However, low latency in transferring data over the bus or network does not necessarily translate into low transport delays between the application processes that source and sink this data. This is especially true where the transfers over the bus or network are cyclically scheduled (as is commonly the case with MIL-STD-1553B and STANAG 3910, and necessarily so with AFDX and TTEthernet) but the application processes are [[wikt:asynchronous|asynchronous]],not e.g. [[Preemption (computing)|pre-emptively scheduled]], or only [[Plesiochronous system|plesiosynchronous]]synchronized with this schedule. In this case, the maximum delay and jitter will be twice the update rate for the cyclic transfer (transfers wait up to the update interval between release and transmission and again wait up to the update interval between delivery and use).
 
With both AFDX and TTEthernet, there are additional functions required of the interfaces to the network for the transmission of critical data, etc., that make it difficult to use standard Ethernet interfaces, e.g. AFDX's Bandwidth Allocation Gap control, and TTEthernet's requirement for very close synchronization of the sources of time-triggered data, that make it difficult to use standard Ethernet interfaces. Other methods for control of the traffic in the network that would allow the use of such standard IEEE 802.3 network interfaces is a subject of current research.<ref name="Charlton et al 2013">{{citation |author=D. W. Charlton, et|display-authors=etal al.,|title=An "ANAvionic AVIONICGigabit GIGABITEthernet ETHERNET NETWORK",Network |work=Avionics, Fiber-Optics and Photonics Conference (AVFOP), |year=2013 |publisher=IEEE, 2013, |pages =17–18. {{doi|doi=10.1109/AVFOP.2013.6661601|isbn=978-1-4244-7348-9 |s2cid=3162009 }} </ref>
==See also==
*{{anl|Robustness of complex networks}}
*{{anl|Efficiency (network science)}}
*{{anl|Cascading failure}}
 
==References==