Skip to content

Transport Layer

The Transport Layer in InfiniBand is responsible for the reliable or unreliable delivery of messages between two endpoints. It provides end-to-end virtual channels to connect two applications into separate address spaces.

Endpoints of a virtual channel are called Queue Pairs (QPs). A QP consists of two queues:

  • Send Queue: For outgoing requests.
  • Receive Queue: For incoming requests.

QPs provide a framework for applications to transfer data between each other, bypassing the kernel and using the InfiniBand Host Channel Adapters (HCAs) to manage reliability. Each QP is identified by a 24-bit Queue Pair Number (QPN).

  1. Work Queue: Applications interface to the IB fabric via a Work Queue.
  2. Work Request (WR): If an app wants to send data, it posts a Work Request to the Work Queue.
  3. Work Queue Element (WQE): The WR is placed on the Work Queue as a Work Queue Element.
  4. Completion Queue Element (CQE): Once the HCA completes the WQE, it places a Completion Queue Element on a Completion Queue (CQ).
  5. Status: The Application checks the CQ to determine the status of its Work Request.

Each QP is assigned a specific transport mode. Both the source and destination QPs must use the same mode. The four main modes are:

  1. Reliable Connection (RC)
  2. Unreliable Connection (UC)
  3. Reliable Datagram (RD)
  4. Unreliable Datagram (UD)
  • Connected Mode (RC, UC):

    • Requires a dedicated QP on both the source and destination (one QP per connection).
    • Supports messages larger than the MTU (HCA handles segmentation/reassembly).
    • Generally more performant than Datagram mode.
    • Consumes more kernel memory (scaling with number of connections).
    • Commonly the default for most messaging.
  • Datagram Mode (UD, RD):

    • Can use a single QP for multiple connections (sending/receiving from multiple remote QPs).
    • Does NOT support segmentation (messages must fit within MTU).
    • Does not perform as well as Connected mode but scales better with lower memory usage for one-to-many communication.
  • Reliable Mode (RC, RD):

    • Uses a Packet Sequence Number (PSN) to track packets (similar to TCP Sequence Numbers).
    • Receiver sends ACKs (Acknowledgements) and NAKs (Negative Acknowledgements) to notify sender of status.
    • Sender QP maintains a timer to catch undelivered packets (retransmission).
  • Unreliable Mode (UC, UD):

    • Does not track if packets are received.
    • Similar to UDP.

The Transport Layer handles breaking down large messages into smaller packets that fit the Maximum Transmission Unit (MTU).

  • MTU Sizes: Typically 256, 512, 1024, 2048, or 4096 bytes (default is often 4096).
  • Segmentation: If a payload is larger than the MTU, it is segmented into multiple packets.
  • Reassembly: The receiving end reassembles the packets into the original message.
  • Datagram Limit: Datagram modes (UD) do not support segmentation; the application must ensure messages fit within the MTU.