Remote direct memory access: Difference between revisions

Content deleted Content added
Raanoo (talk | contribs)
Acceptance: remove as of 2013 phrasing as that is over a decade ago, so more past tense phrasing sounds better.
 
(137 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Low-level hardware direct memory access}}
In [[computing]], '''Remoteremote Directdirect Memorymemory Accessaccess''' ('''RDMA''') allowsis [[direct memory access|data to move directly]] from the [[main memory|memory]] of one [[computer]] into that of another without involving either onecomputer's [[operating system]]. This permits high-throughput, low-[[LatencyNetwork (engineering)latency|latency]] networkingmemory access over a network, which is especially useful in massively parallel [[computer cluster]]s.
 
== Overview ==
RDMA supports [[zero-copy]] networking by enabling the [[network adapter]] to transfer data from the wire directly to application memory or from application memory directly to the wire, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by [[Central processing unit|CPUs]], [[CPU cache|caches]], or [[context switchesswitch]]es, and transfers continue in parallel with other system operations. WhenThis an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducingreduces latency and enabling fastin message transfer.
 
However, this strategy presents several problems related to the fact that the target node is not notified of the completion of the request (single-sided communications).
This strategy presents several problems related to the fact that the target node is not notified of the completion of the request (1-sided communications). The common way to notify it is to change a memory byte when the data has been delivered, but it requires the target to poll on this byte. Not only this polling consumes CPU cycles, but also the memory footprint and the latency increases linearily with the number of possible other nodes. These issues and the fact that the RDMA programming model is very different from the one that is generally used in the High-Performance Computing world ([[MPI]], the Message Passing Interface) explains why the RDMA success is very limited in HPC. The Send/Recv model used by other [[zero-copy]] HPC interconnects such as [[Myrinet]] or [[Quadrics]] does not have any of these problems and presents as good performance since their native programming interface is very similar to MPI.
 
== Acceptance ==
RDMA reduces the need for [[Protocol (computing)|protocol]] overhead, which can squeeze out the capacity to move data across a network, reducing [[Network performance|performance]], limiting how fast an application can get the data it needs, and restricting the size and scalability of a [[Computer cluster|cluster]].
RDMA’sAs of 2018 RDMA had achieved broader acceptance isas alsoa limitedresult byof theimplementation needenhancements tothat installenable agood differentperformance over ordinary networking infrastructure.<ref>RoCE NewRocks standardsover enableLossy Network: https://dl.acm.org/citation.cfm?id=3098588&dl=ACM&coll=DL</ref> For example [[RDMA over Converged Ethernet]] (RoCE) now is able to run over either lossy or lossless infrastructure. In addition [[iWARP]] enables an [[Ethernet]] RDMA implementation at the physical layer andusing [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution.<ref>{{cite web|url=https://www.intel.com/content/dam/support/us/en/documents/network/sb/understanding_iwarp_final.pdf|title=Understanding iWARP|publisher=Intel Corporation|accessdate=16 May 2018}}</ref> The RDMA Consortium and the DAT Collaborative<ref>[{{cite web|url=http://www.datcollaborative.org/ |title=DAT Collaborative website|accessdate=14 October 2014|url-status=dead|archiveurl=https://web.]archive.org/web/20150117180600/http://www.datcollaborative.org/|archivedate=17 January 2015}}</ref> have played key roles in the development of RDMA protocols and [[Application programming interface|APIs]] for consideration by standards groups such as the [[Internet Engineering Task Force]] and the Interconnect Software Consortium.<ref>[http://www.opengroup.org/icsc/ The Interconnect Software Consortium website.] {{webarchive|url=https://web.archive.org/web/20050830201232/http://www.opengroup.org/icsc/ |date=2005-08-30 }}</ref> Software vendors such as [[Oracle Corporation]] support these APIs in their latest products, and network adapters that implement RDMA over Ethernet are being developed.
 
Hardware vendors have started working on higher-capacity RDMA-based network adapters, with rates of 100&nbsp;Gbit/s reported.<ref>{{cite web|url=http://www.mellanox.com/page/file_storage/|title=Microsoft Based Solutions - Mellanox Technologies|accessdate=14 October 2014}}</ref><ref name="chelsio">{{cite web|url=http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/|title=40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications|date=2 April 2013 |accessdate=14 October 2014}}</ref> Software vendors, such as [[IBM]],<ref>{{Cite web | url=https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf |title = SOFA-STORAGE: CREATING A VENDOR AGNOSTIC FRAMEWORK TO ENABLE SEAMLESS STORAGE OFFLOAD USING SMARTNICS}}</ref> [[Red Hat]] and [[Oracle Corporation]], support these APIs in their latest products,<ref>{{Cite web | url=https://access.redhat.com/solutions/22188 |title = What RDMA hardware is supported in Red Hat Enterprise Linux?| date=2 June 2016 }}</ref> and since 2013, engineers have been developing network adapters that implement RDMA over Ethernet.<ref>
RDMA’s acceptance is also limited by the need to install a different networking infrastructure. New standards enable [[Ethernet]] RDMA implementation at the physical layer and [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution. The RDMA Consortium and the DAT Collaborative<ref>[http://www.datcollaborative.org/ DAT Collaborative website.]</ref> have played key roles in the development of RDMA protocols and [[Application programming interface|APIs]] for consideration by standards groups such as the [[Internet Engineering Task Force]] and the Interconnect Software Consortium.<ref>[http://www.opengroup.org/icsc/ The Interconnect Software Consortium website.]</ref> Software vendors such as [[Oracle Corporation]] support these APIs in their latest products, and network adapters that implement RDMA over Ethernet are being developed.
{{cite web
| url= http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/
| title= 40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications
| date = 2013-04-02
| publisher= Chelsio Communications
| accessdate= 2016-07-15
| quote = The demonstration will show Microsoft's Windows Server 2012 SMB Direct running at line-rate 40Gb using RDMA over Ethernet (iWARP).
}}
</divref>
Both [[Red Hat Enterprise Linux]] and [[Red Hat Enterprise MRG]]<ref>{{cite web|url=https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|title=Red Hat Enterprise MRG 2.0 Now Available|accessdate=23 June 2011|url-status=dead|archiveurl=https://web.archive.org/web/20160825215016/https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|archivedate=25 August 2016}}</ref> have support for RDMA. Microsoft supports RDMA in [[Windows Server 2012]] via [[Server Message Block|SMB Direct]]. [[VMware ESXi]] also supports RDMA as of 2015.
 
Common RDMA implementations include the [[Virtual Interface Architecture]], [[RDMA over Converged Ethernet]] (RoCE), [[InfiniBand]], [[Omni-Path]], [[iWARP]] and Ultra Ethernet.
The most common RDMA implementation is over [[InfiniBand]], which is technologically superior to most alternatives but faces an uncertain commercial future.
 
==Notes Using RDMA ==
Applications access control structures using well-defined APIs originally designed for the InfiniBand Protocol (although the APIs can be used for any of the underlying RDMA implementations). Using send and completion queues, applications perform RDMA operations by submitting work queue entries (WQEs) into the submission queue (SQ) and getting notified of responses from the completion queue (CQ). <ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/abs/10.1145/3319647.3325827</ref>
<div class="references-small">
 
<references/>
== Transport types ==
</div>
RDMA can transport data reliably or unreliably over the Reliably Connected (RC) and Unreliable Datagram (UD) transport protocols, respectively. The former has the benefit of preserving requests (no requests are lost), while the latter requires fewer queue pairs when handling multiple connections. This is due to the fact that UD is connection-less, allowing a single host to communicate with any other using a single queue.<ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/pdf/10.1145/3319647.3325827</ref>
 
== References ==
{{Reflist}}
 
== External links ==
* [http://www.rdmaconsortium.org/home RDMA Consortium]
* {{IETF RFC|5040}}: A Remote Direct Memory Access Protocol Specification
* [http://www.hpcwire.com/hpc/815242.html A Critique of RDMA] for High-Performance Computing
* [http://www.hpcwire.com/2006/09/15/a_tutorial_of_the_rdma_model-1/ A Tutorial of the RDMA Model]
* [https://www.hpcwire.com/2006/10/06/why_compromise-1/ "Why Compromise?"] // HPCwire, Gilad Shainer (Mellanox Technologies), 2006
* [http://www.hpcwire.com/hpchpcwire/8152422006-08-18/a_critique_of_rdma-1.html A Critique of RDMA] for Highhigh-Performanceperformance Computingcomputing
* [https://www.cs.utah.edu/~stutsman/cs6450/public/papers/rdma.pdf RDMA Reads: To Use or Not to Use?]
* [https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf]
 
[[Category:Computer memory]]
[[Category:Operating system technology]]
[[Category:Local area networks]]
[[Category:Computer organization]]