Remote direct memory access: Difference between revisions

Content deleted Content added
Sharpen category.
Acceptance: remove as of 2013 phrasing as that is over a decade ago, so more past tense phrasing sounds better.
 
(142 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Low-level hardware direct memory access}}
'''Remote Direct Memory Access''' ('''RDMA''') enables two or more [[computer]]s to use each others' [[main memory]] via [[direct memory access]]. As there is no [[Central processing unit|CPU]], [[cache]], or [[context switch]]ing overhead needed to perform the transfer, and transfers continue in parallel with other system operations. This is particularly useful in applications where high throughput, low latency networking is needed such as in massively parallel [[Linux]] clusters. The most common RDMA implementation is over [[InfiniBand]]. Although RDMA over InfiniBand is technologically superior to most alternatives, it faces an uncertain commercial future.
In [[computing]], '''remote direct memory access''' ('''RDMA''') is [[direct memory access]] from the [[main memory|memory]] of one computer into that of another without involving either computer's [[operating system]]. This permits high-throughput, low-[[Network latency|latency]] memory access over a network, which is especially useful in massively parallel [[computer cluster]]s.
 
== Overview ==
High [[Latency (engineering)|latencies]] negatively impact potential [[bandwidth]]. When latencies are high, [[Protocol (computing)|protocol]] overhead overwhelms the work needed to deliver data. High latencies create a [[Bottleneck (network)|bottleneck]], preventing full utilization of the network, thus decreasing network bandwidth. Latency reduces overall [[Network performance|performance]] by limiting how fast an application can get the data it needs and limiting the overall size and scalability of a [[Computer cluster|cluster]] by reducing the number of messages that can be effectively transferred.
RDMA supports [[zero-copy]] networking by enabling the [[network adapter]] to transfer data from the wire directly to application memory or from application memory directly to the wire, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by [[Central processing unit|CPUs]], [[CPU cache|caches]], or [[context switch]]es, and transfers continue in parallel with other system operations. This reduces latency in message transfer.
 
However, this strategy presents several problems related to the fact that the target node is not notified of the completion of the request (single-sided communications).
Once a connection has been established, RDMA enables the movement of data from the memory of one server directly into the memory of another server without involving the [[operating system]] of either node. RDMA supports [[zero-copy]] networking by enabling the [[network adapter]] to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, reducing latency and enabling fast message transfer.
 
== Acceptance ==
RDMA’s acceptance is limited by the need to install a different networking infrastructure. New standards enable [[Ethernet]] RDMA implemention at the physical layer and [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution. The RDMA Consortium and the DAT Collaborative<ref>[http://www.datcollaborative.org/ DAT Collaborative website.]</ref> have played key roles in the development of RDMA protocols and [[Application programming interface|APIs]] for consideration by standards groups such as the [[Internet Engineering Task Force]] and the Interconnect Software Consortium.<ref>[http://www.opengroup.org/icsc/ The Interconnect Software Consortium website.]</ref> Software vendors such as [[Oracle Corporation]] support these APIs in their latest products, and network adapters that implement RDMA over Ethernet are being developed.
As of 2018 RDMA had achieved broader acceptance as a result of implementation enhancements that enable good performance over ordinary networking infrastructure.<ref>RoCE Rocks over Lossy Network: https://dl.acm.org/citation.cfm?id=3098588&dl=ACM&coll=DL</ref> For example [[RDMA over Converged Ethernet]] (RoCE) now is able to run over either lossy or lossless infrastructure. In addition [[iWARP]] enables an [[Ethernet]] RDMA implementation at the physical layer using [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution.<ref>{{cite web|url=https://www.intel.com/content/dam/support/us/en/documents/network/sb/understanding_iwarp_final.pdf|title=Understanding iWARP|publisher=Intel Corporation|accessdate=16 May 2018}}</ref> The RDMA Consortium and the DAT Collaborative<ref>{{cite web|url=http://www.datcollaborative.org/|title=DAT Collaborative website|accessdate=14 October 2014|url-status=dead|archiveurl=https://web.archive.org/web/20150117180600/http://www.datcollaborative.org/|archivedate=17 January 2015}}</ref> have played key roles in the development of RDMA protocols and [[Application programming interface|APIs]] for consideration by standards groups such as the [[Internet Engineering Task Force]] and the Interconnect Software Consortium.<ref>[http://www.opengroup.org/icsc/ The Interconnect Software Consortium website] {{webarchive|url=https://web.archive.org/web/20050830201232/http://www.opengroup.org/icsc/ |date=2005-08-30 }}</ref>
 
Hardware vendors have started working on higher-capacity RDMA-based network adapters, with rates of 100&nbsp;Gbit/s reported.<ref>{{cite web|url=http://www.mellanox.com/page/file_storage/|title=Microsoft Based Solutions - Mellanox Technologies|accessdate=14 October 2014}}</ref><ref name="chelsio">{{cite web|url=http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/|title=40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications|date=2 April 2013 |accessdate=14 October 2014}}</ref> Software vendors, such as [[IBM]],<ref>{{Cite web | url=https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf |title = SOFA-STORAGE: CREATING A VENDOR AGNOSTIC FRAMEWORK TO ENABLE SEAMLESS STORAGE OFFLOAD USING SMARTNICS}}</ref> [[Red Hat]] and [[Oracle Corporation]], support these APIs in their latest products,<ref>{{Cite web | url=https://access.redhat.com/solutions/22188 |title = What RDMA hardware is supported in Red Hat Enterprise Linux?| date=2 June 2016 }}</ref> and since 2013, engineers have been developing network adapters that implement RDMA over Ethernet.<ref>
==Notes==
{{cite web
<div class="references-small">
| url= http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/
<references/>
| title= 40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications
</div>
| date = 2013-04-02
| publisher= Chelsio Communications
| accessdate= 2016-07-15
| quote = The demonstration will show Microsoft's Windows Server 2012 SMB Direct running at line-rate 40Gb using RDMA over Ethernet (iWARP).
}}
</divref>
Both [[Red Hat Enterprise Linux]] and [[Red Hat Enterprise MRG]]<ref>{{cite web|url=https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|title=Red Hat Enterprise MRG 2.0 Now Available|accessdate=23 June 2011|url-status=dead|archiveurl=https://web.archive.org/web/20160825215016/https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|archivedate=25 August 2016}}</ref> have support for RDMA. Microsoft supports RDMA in [[Windows Server 2012]] via [[Server Message Block|SMB Direct]]. [[VMware ESXi]] also supports RDMA as of 2015.
 
Common RDMA implementations include the [[Virtual Interface Architecture]], [[RDMA over Converged Ethernet]] (RoCE), [[InfiniBand]], [[Omni-Path]], [[iWARP]] and Ultra Ethernet.
 
== Using RDMA ==
Applications access control structures using well-defined APIs originally designed for the InfiniBand Protocol (although the APIs can be used for any of the underlying RDMA implementations). Using send and completion queues, applications perform RDMA operations by submitting work queue entries (WQEs) into the submission queue (SQ) and getting notified of responses from the completion queue (CQ). <ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/abs/10.1145/3319647.3325827</ref>
 
== Transport types ==
RDMA can transport data reliably or unreliably over the Reliably Connected (RC) and Unreliable Datagram (UD) transport protocols, respectively. The former has the benefit of preserving requests (no requests are lost), while the latter requires fewer queue pairs when handling multiple connections. This is due to the fact that UD is connection-less, allowing a single host to communicate with any other using a single queue.<ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/pdf/10.1145/3319647.3325827</ref>
 
== References ==
{{Reflist}}
 
== External links ==
* [http://www.rdmaconsortium.org/home RDMA Consortium]
* {{IETF RFC|5040}}: A Remote Direct Memory Access Protocol Specification
* [http://www.hpcwire.com/2006/09/15/a_tutorial_of_the_rdma_model-1/ A Tutorial of the RDMA Model]
* [https://www.hpcwire.com/2006/10/06/why_compromise-1/ "Why Compromise?"] // HPCwire, Gilad Shainer (Mellanox Technologies), 2006
* [http://www.hpcwire.com/hpcwire/2006-08-18/a_critique_of_rdma-1.html A Critique of RDMA] for high-performance computing
* [https://www.cs.utah.edu/~stutsman/cs6450/public/papers/rdma.pdf RDMA Reads: To Use or Not to Use?]
* [https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf]
 
[[Category:Computer memory]]