Remote direct memory access: Difference between revisions

Content deleted Content added
Morgaladh (talk | contribs)
No edit summary
Acceptance: remove as of 2013 phrasing as that is over a decade ago, so more past tense phrasing sounds better.
 
(146 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Low-level hardware direct memory access}}
'''Remote Direct Memory Access''' ('''RDMA''') is a concept whereby two or more [[computer]]s communicate via [[Direct Memory Access]] directly from the [[main memory]] of one system to the main memory of another. As there is no CPU, cache, or context switching overhead needed to perform the transfer, and transfers can continue in parallel with other system operations, this is particularly useful in applications where high throughput, low latency networking is needed such as in massively parallel [[Linux]] clusters. The most common RDMA implementation is over [[InfiniBand]]. Although RDMA over InfiniBand is technologically superior to most alternatives, it faces an uncertain commercial future.
In [[computing]], '''remote direct memory access''' ('''RDMA''') is [[direct memory access]] from the [[main memory|memory]] of one computer into that of another without involving either computer's [[operating system]]. This permits high-throughput, low-[[Network latency|latency]] memory access over a network, which is especially useful in massively parallel [[computer cluster]]s.
 
== Overview ==
High latencies can negatively impact the bandwidth that can be achieved. When latencies are high, the protocol overhead can overwhelm the work needed to deliver the data. High latencies can create a bottleneck which prevents full utilization of the network, thus decreasing network bandwidth. Latency reduces overall performance by limiting how fast an application can get the data it needs, and limits the overall size and scalability of a cluster by limiting the number of messages that can be effectively put on the wire.
RDMA supports [[zero-copy]] networking by enabling the [[network adapter]] to transfer data from the wire directly to application memory or from application memory directly to the wire, eliminating the need to copy data between application memory and the data buffers in the operating system. Such transfers require no work to be done by [[Central processing unit|CPUs]], [[CPU cache|caches]], or [[context switch]]es, and transfers continue in parallel with other system operations. This reduces latency in message transfer.
 
However, this strategy presents several problems related to the fact that the target node is not notified of the completion of the request (single-sided communications).
Once a connection has been established, RDMA enables the movement of data from the memory of one server directly into the memory of another server without involving the operating system of either node. RDMA supports “zero-copy” networking by enabling the network adapter to transfer data directly to or from application memory, eliminating the need to copy data between application memory and the data buffers in the operating system. When an application performs an RDMA Read or Write request, the application data is delivered directly to the network, hence latency is reduced and applications can transfer messages faster.
 
== Acceptance ==
RDMA’s widespread acceptance has been limited by the historical need to install a new and different networking infrastructure. However, new standards are enabling RDMA to be implemented using [[Ethernet]] as the physical layer and [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, thus combining all the performance and latency advantages of RDMA with the low-cost and standards benefit of Ethernet and TCP/IP. The RDMA Consortium and the DAT Collaborative have played key roles in the development of RDMA protocols and APIs for consideration by standards groups such as the Internet Engineering Task Force and the Interconnect Software Consortium. Major software vendors such as Oracle ® Corporation are now supporting these new APIs in their latest products, and new network adapters that implement RDMA over Ethernet are expected this year.
As of 2018 RDMA had achieved broader acceptance as a result of implementation enhancements that enable good performance over ordinary networking infrastructure.<ref>RoCE Rocks over Lossy Network: https://dl.acm.org/citation.cfm?id=3098588&dl=ACM&coll=DL</ref> For example [[RDMA over Converged Ethernet]] (RoCE) now is able to run over either lossy or lossless infrastructure. In addition [[iWARP]] enables an [[Ethernet]] RDMA implementation at the physical layer using [[Transmission Control Protocol|TCP]]/[[Internet Protocol|IP]] as the transport, combining the performance and latency advantages of RDMA with a low-cost, standards-based solution.<ref>{{cite web|url=https://www.intel.com/content/dam/support/us/en/documents/network/sb/understanding_iwarp_final.pdf|title=Understanding iWARP|publisher=Intel Corporation|accessdate=16 May 2018}}</ref> The RDMA Consortium and the DAT Collaborative<ref>{{cite web|url=http://www.datcollaborative.org/|title=DAT Collaborative website|accessdate=14 October 2014|url-status=dead|archiveurl=https://web.archive.org/web/20150117180600/http://www.datcollaborative.org/|archivedate=17 January 2015}}</ref> have played key roles in the development of RDMA protocols and [[Application programming interface|APIs]] for consideration by standards groups such as the [[Internet Engineering Task Force]] and the Interconnect Software Consortium.<ref>[http://www.opengroup.org/icsc/ The Interconnect Software Consortium website] {{webarchive|url=https://web.archive.org/web/20050830201232/http://www.opengroup.org/icsc/ |date=2005-08-30 }}</ref>
 
Hardware vendors have started working on higher-capacity RDMA-based network adapters, with rates of 100&nbsp;Gbit/s reported.<ref>{{cite web|url=http://www.mellanox.com/page/file_storage/|title=Microsoft Based Solutions - Mellanox Technologies|accessdate=14 October 2014}}</ref><ref name="chelsio">{{cite web|url=http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/|title=40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications|date=2 April 2013 |accessdate=14 October 2014}}</ref> Software vendors, such as [[IBM]],<ref>{{Cite web | url=https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf |title = SOFA-STORAGE: CREATING A VENDOR AGNOSTIC FRAMEWORK TO ENABLE SEAMLESS STORAGE OFFLOAD USING SMARTNICS}}</ref> [[Red Hat]] and [[Oracle Corporation]], support these APIs in their latest products,<ref>{{Cite web | url=https://access.redhat.com/solutions/22188 |title = What RDMA hardware is supported in Red Hat Enterprise Linux?| date=2 June 2016 }}</ref> and since 2013, engineers have been developing network adapters that implement RDMA over Ethernet.<ref>
{{cite web
| url= http://www.chelsio.com/chelsio-to-demonstrate-40g-smb-direct-rdma-over-ethernet-for-windows-server-2012/
| title= 40Gbe SMB Direct RDMA Over Ethernet For Windows Server 2012 - Chelsio Communications
| date = 2013-04-02
| publisher= Chelsio Communications
| accessdate= 2016-07-15
| quote = The demonstration will show Microsoft's Windows Server 2012 SMB Direct running at line-rate 40Gb using RDMA over Ethernet (iWARP).
}}
</ref>
Both [[Red Hat Enterprise Linux]] and [[Red Hat Enterprise MRG]]<ref>{{cite web|url=https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|title=Red Hat Enterprise MRG 2.0 Now Available|accessdate=23 June 2011|url-status=dead|archiveurl=https://web.archive.org/web/20160825215016/https://investors.redhat.com/news-and-events/press-releases/2011/06-23-2011|archivedate=25 August 2016}}</ref> have support for RDMA. Microsoft supports RDMA in [[Windows Server 2012]] via [[Server Message Block|SMB Direct]]. [[VMware ESXi]] also supports RDMA as of 2015.
 
Common RDMA implementations include the [[Virtual Interface Architecture]], [[RDMA over Converged Ethernet]] (RoCE), [[InfiniBand]], [[Omni-Path]], [[iWARP]] and Ultra Ethernet.
 
== Using RDMA ==
Applications access control structures using well-defined APIs originally designed for the InfiniBand Protocol (although the APIs can be used for any of the underlying RDMA implementations). Using send and completion queues, applications perform RDMA operations by submitting work queue entries (WQEs) into the submission queue (SQ) and getting notified of responses from the completion queue (CQ). <ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/abs/10.1145/3319647.3325827</ref>
 
== Transport types ==
RDMA can transport data reliably or unreliably over the Reliably Connected (RC) and Unreliable Datagram (UD) transport protocols, respectively. The former has the benefit of preserving requests (no requests are lost), while the latter requires fewer queue pairs when handling multiple connections. This is due to the fact that UD is connection-less, allowing a single host to communicate with any other using a single queue.<ref>Storm: a fast transactional dataplane for remote data structures: https://dl.acm.org/doi/pdf/10.1145/3319647.3325827</ref>
 
== References ==
{{Reflist}}
 
== External links ==
* [http://www.rdmaconsortium.org/home RDMA Consortium]
* {{IETF RFC|5040}}: A Remote Direct Memory Access Protocol Specification
* [http://www.ammasso.com Ammasso RDMA Ethernet Adapter]
* [http://www.hpcwire.com/2006/09/15/a_tutorial_of_the_rdma_model-1/ A Tutorial of the RDMA Model]
* [https://www.hpcwire.com/2006/10/06/why_compromise-1/ "Why Compromise?"] // HPCwire, Gilad Shainer (Mellanox Technologies), 2006
* [http://www.hpcwire.com/hpcwire/2006-08-18/a_critique_of_rdma-1.html A Critique of RDMA] for high-performance computing
* [https://www.cs.utah.edu/~stutsman/cs6450/public/papers/rdma.pdf RDMA Reads: To Use or Not to Use?]
* [https://www.openfabrics.org/wp-content/uploads/2022-workshop/2022-workshop-presentations/201_RPolig.pdf]
 
[[Category:Computer architecturememory]]
[[Category:Operating system technology]]
[[Category:Computer_networksLocal area networks]]