Transport Layer
The Transport Layer in InfiniBand is responsible for the reliable or unreliable delivery of messages between two endpoints. It provides end-to-end virtual channels to connect two applications into separate address spaces.
Queue Pairs (QP)
Section titled “Queue Pairs (QP)”Endpoints of a virtual channel are called Queue Pairs (QPs). A QP consists of two queues:
- Send Queue: For outgoing requests.
- Receive Queue: For incoming requests.
QPs provide a framework for applications to transfer data between each other, bypassing the kernel and using the InfiniBand Host Channel Adapters (HCAs) to manage reliability. Each QP is identified by a 24-bit Queue Pair Number (QPN).
QP Workflow
Section titled “QP Workflow”- Work Queue: Applications interface to the IB fabric via a Work Queue.
- Work Request (WR): If an app wants to send data, it posts a Work Request to the Work Queue.
- Work Queue Element (WQE): The WR is placed on the Work Queue as a Work Queue Element.
- Completion Queue Element (CQE): Once the HCA completes the WQE, it places a Completion Queue Element on a Completion Queue (CQ).
- Status: The Application checks the CQ to determine the status of its Work Request.
Transport Service Types
Section titled “Transport Service Types”Each QP is assigned a specific transport mode. Both the source and destination QPs must use the same mode. The four main modes are:
- Reliable Connection (RC)
- Unreliable Connection (UC)
- Reliable Datagram (RD)
- Unreliable Datagram (UD)
Connected vs Datagram Mode
Section titled “Connected vs Datagram Mode”-
Connected Mode (RC, UC):
- Requires a dedicated QP on both the source and destination (one QP per connection).
- Supports messages larger than the MTU (HCA handles segmentation/reassembly).
- Generally more performant than Datagram mode.
- Consumes more kernel memory (scaling with number of connections).
- Commonly the default for most messaging.
-
Datagram Mode (UD, RD):
- Can use a single QP for multiple connections (sending/receiving from multiple remote QPs).
- Does NOT support segmentation (messages must fit within MTU).
- Does not perform as well as Connected mode but scales better with lower memory usage for one-to-many communication.
Reliable vs Unreliable
Section titled “Reliable vs Unreliable”-
Reliable Mode (RC, RD):
- Uses a Packet Sequence Number (PSN) to track packets (similar to TCP Sequence Numbers).
- Receiver sends ACKs (Acknowledgements) and NAKs (Negative Acknowledgements) to notify sender of status.
- Sender QP maintains a timer to catch undelivered packets (retransmission).
-
Unreliable Mode (UC, UD):
- Does not track if packets are received.
- Similar to UDP.
Segmentation and Reassembly
Section titled “Segmentation and Reassembly”The Transport Layer handles breaking down large messages into smaller packets that fit the Maximum Transmission Unit (MTU).
- MTU Sizes: Typically 256, 512, 1024, 2048, or 4096 bytes (default is often 4096).
- Segmentation: If a payload is larger than the MTU, it is segmented into multiple packets.
- Reassembly: The receiving end reassembles the packets into the original message.
- Datagram Limit: Datagram modes (UD) do not support segmentation; the application must ensure messages fit within the MTU.