Skip to content

RDMA Overview

Remote Direct Memory Access (RDMA) allows data read/write operations without involving the CPU of either endpoint. It offloads these tasks to the Network Interface Controller (NIC), Data Processing Unit (DPU), or Host Channel Adapter (HCA).

  • Reduced Latency: Removing the kernel and CPU from the data path reduces overhead.
  • Increased Throughput: More efficient data movement.
  • Reduced Load on CPU: Frees up the host CPU for application logic.

Applications allocate buffers and hand over control of the buffers to the HCAs. The HCAs are allowed to write data directly to this virtual buffer, bypassing the kernel and CPU.

This semantic is similar to traditional socket programming (send() / recv()).

Send Operation Example:

  1. Recv Side: Allocates a receive buffer in user space virtual memory and registers it with the HCA by placing a Work Request (WR) on the Receive Queue.
  2. Send Side: Allocates a send buffer in user space virtual memory and registers it with the HCA by placing a Work Request (WR) in the Send Queue.
  3. HCA Execution: The sending HCA executes the Work Queue Element (WQE), reads data from the buffer, and sends it to the remote side. It generates a Completion Queue Element (CQE) to notify the app.
  4. Recv Execution: When data arrives on the receiving HCA, it executes the receive WQE, places data in host memory, and generates a CQE to notify the application that data is ready.

This semantic allows one side to read or write directly to the memory of another.

RDMA Write Operation Example:

  1. Recv Side: Allocates a receive buffer in user space virtual memory and registers it with the HCA by placing a Work Request on the Receive Queue. (Note: For RDMA Write, the receiver just needs to have registered memory available).
  2. Send Side: Allocates a send buffer in user space virtual memory and registers it with the HCA. It places a WQE in the send queue containing the remote side’s virtual address and remote partition key.
  3. HCA Execution: The sending HCA executes the send WQE, reads data from the host buffer, sends it to the remote HCA, and generates a CQE.
  4. Recv Execution: When data arrives at the receiving HCA, it checks the address and memory keys and writes directly to host memory without involving the remote CPU.