InfiniBand Overview

Introduction

InfiniBand is an open standard, network communications protocol developed by the InfiniBand Trade Association (IBTA) - www.infinibandta.org

For a deep dive into InfiniBand concepts, see the Mellanox InfiniBand FAQ and the Introduction to InfiniBand Whitepaper. Note that these documents are from around 2014, so some throughput specifications may be outdated, but the core concepts remain relevant.

Prominent members of the IBTA are as follows:

Nvidia
Intel
IBM
Oracle
HPE

InfiniBand is a high throughput low latency networking specification used to interconnect servers, switches, storage and embedded systems.

It’s heavily used in artificial intelligence and data science due to its ability to support incredibly high bandwidths with low latency and high scalability/flexibility.

Bandwidth Details

InfiniBand originally offered 10Gb/s starting in 2002 and has grown up to 1600Gb/s as of 2025. InfiniBand has always been non-blocking and bidirectional (full-duplex).

10Gb/s SDR (2002)
40Gb/s QDR (2008)
56Gb/s FDR (2011)
100Gb/s EDR (2015)
200Gb/s HDR (2018)
400Gb/s NDR (2021)
800Gb/s XDR (2023)
1600Gb/s GDR (2025)

Nvidia for example currently ships current NDR 400Gb/s fabrics and will continue to increase speeds as newer specifications are released by the IBTA.

Port Structure

InfiniBand port bandwidth is achieved through a combination of multiple aggregated physical lanes. It can support up to 12 physical lanes but is typically implemented with 4 physical lanes.

For example, the HDR specification contains 4 physical lanes with each lane having a bi-directional bandwidth of 50Gb/s (200Gb/s total).

graph LR
    %%{init: {'theme': 'base', 'themeVariables': { 'edgeLabelBackground': '#ffffff'}}}%%
    A[HDR Port<br/>200Gb/s Total] -->|Lane 1<br/>50Gb/s| L1[Physical Lane 1]
    A -->|Lane 2<br/>50Gb/s| L2[Physical Lane 2]
    A -->|Lane 3<br/>50Gb/s| L3[Physical Lane 3]
    A -->|Lane 4<br/>50Gb/s| L4[Physical Lane 4]
    style A fill:#e0f2fe,stroke:#0369a1,stroke-width:2px,color:#000000
    style L1 fill:#d1fae5,stroke:#10b981,color:#000000
    style L2 fill:#d1fae5,stroke:#10b981,color:#000000
    style L3 fill:#d1fae5,stroke:#10b981,color:#000000
    style L4 fill:#d1fae5,stroke:#10b981,color:#000000

Diagram: An HDR port aggregates four physical lanes, each capable of 50Gb/s in both directions, totaling 200Gb/s full-duplex bandwidth.

InfiniBand components

HCA (Host Channel Adapter) - InfiniBand Network adapter installed onto a server.
Switch - InfiniBand switch that moves packets within an IB subnet.
Router - InfiniBand router that moves packets between IB subnets.
Gateway - InfiniBand router that can enable IB hosts to communicate with an Ethernet network.
Subnet Manager - Software that discovers and manages InfiniBand Nodes and Links within a subnet.

Low Latency Overview

Low latencies down to 1000 nanoseconds can be achieved through the use of IB specific accelerating and offloading mechanisms such as RDMA. This allows applications to completely bypass the OS kernel and get direct access to memory. The IB HCA handles moving data between application owned memory buffers removing the need for the Kernel to manage this. Essentially data is transferred directly between the GPUs and is managed by the HCA.

flowchart TB
    %%{init: {'theme': 'base', 'themeVariables': { 'edgeLabelBackground': '#ffffff'}}}%%
    subgraph H1[Host 1]
        direction TB
        H1APP[Application]
        H1K[Kernel]
        H1HCA[HCA]
        H1APP -.->|Traditional| H1K -.-> H1HCA
    end

    subgraph H2[Host 2]
        direction TB
        H2APP[Application]
        H2K[Kernel]
        H2HCA[HCA]
        H2HCA -.-> H2K -.->|Traditional| H2APP
    end

    H1APP ==>|RDMA Bypass| H1HCA
    H1HCA <-->|IB Fabric| H2HCA
    H2HCA ==>|RDMA Bypass| H2APP

    style H1APP fill:#f7fee7,stroke:#65a30d,color:#000000
    style H1K fill:#fee2e2,stroke:#db2777,stroke-dasharray: 5,color:#000000
    style H1HCA fill:#dbeafe,stroke:#2563eb,color:#000000
    style H2APP fill:#f7fee7,stroke:#65a30d,color:#000000
    style H2K fill:#fee2e2,stroke:#db2777,stroke-dasharray: 5,color:#000000
    style H2HCA fill:#dbeafe,stroke:#2563eb,color:#000000

Diagram: The traditional path (dotted lines) routes data through the OS kernel. With RDMA (thick arrows), the application bypasses the kernel entirely and communicates directly with the HCA.

Subnet Manager

InfiniBand fabric requires a Subnet Manager (Software) to be running which runs and manages the fabric.

The Subnet Manager is responsible for the following:

Node and Link Discovery.
Local identifier assignments - LIDs (Similar to Mac Addresses in Ethernet).
Routing table calculations and deployments.
Configuring nodes and ports parameters like the QoS policy.

Scalability

A single InfiniBand subnet can scale up to 48k nodes and can be scaled beyond this limit by adding multiple subnets and connecting them with an InfiniBand router.

flowchart TB
    %%{init: {'theme': 'base', 'themeVariables': { 'edgeLabelBackground': '#ffffff'}}}%%
    subgraph S1 [Subnet 1]
        direction TB
        N1[Node 1]
        N2[Node 2]
        N3[... up to 48k Nodes ...]
        SM1[Subnet Manager]
    end

    subgraph S2 [Subnet 2]
        direction TB
        N4[Node 1]
        N5[Node 2]
        N6[... up to 48k Nodes ...]
        SM2[Subnet Manager]
    end

    Router[InfiniBand Router]

    S1 <--> Router <--> S2

    style S1 fill:#f0fdf4,stroke:#16a34a,color:#000000
    style S2 fill:#f0fdf4,stroke:#16a34a,color:#000000
    style Router fill:#fef9c3,stroke:#ca8a04,stroke-width:2px,color:#000000

    style N1 fill:#ffffff,stroke:#374151,color:#000000
    style N2 fill:#ffffff,stroke:#374151,color:#000000
    style N3 fill:#ffffff,stroke:#374151,color:#000000
    style SM1 fill:#ffffff,stroke:#374151,color:#000000
    style N4 fill:#ffffff,stroke:#374151,color:#000000
    style N5 fill:#ffffff,stroke:#374151,color:#000000
    style N6 fill:#ffffff,stroke:#374151,color:#000000
    style SM2 fill:#ffffff,stroke:#374151,color:#000000

Diagram: Scaling beyond the 48k node subnet limit by connecting multiple subnets via an InfiniBand Router.

Adaptive Routing

Adaptive Routing is enabled on all Nvidia (Mellanox) IB switches and offers several capabilities.

Link Failure Recovery

The subnet manager computes the routing table and pushes it out to the IB switches. A link failure can cause up to 5 seconds of downtime before the Subnet Manager can recalculate a new routing topology. Nvidia IB Switches can handle this failure almost immediately through the use of Adaptive Routing, reducing recovery time to 1ms. The feature is referred to as SHIELD (Self-Healing Interconnect Enhancement for Intelligent Datacenters) and commonly referred to as Fast Link Fault Recovery (FLFR) in Nvidia documentation.

For more details, see How To Configure Adaptive Routing and Self-Healing Networking.

Load Balancing

Nvidia switches support dynamic load balancing which can achieve better fabric utilization than simple ECMP routing. It’s achieved by the Adaptive Routing feature enabled on the switches and managed centrally using Adaptive Routing Manager similar to the Subnet Manager.

QoS

Implemented across the fabric by defining I/O channels at the HCA level and defining Virtual Lanes at the link level. This is managed centrally by the Subnet Manager.

For more details, see the Nvidia QoS Documentation.

SHARP - Scalable Hierarchical Aggregation and Reduction Protocol

Primary function is to offload the need to send data multiple times from/to host CPUs and GPUs. Essentially the host can send data once and the switch can handle the replication using SHARP similar to multicast in ethernet.

For more details, see the Nvidia SHARP Documentation.