Skip to content

Routing & Flow Control

InfiniBand fabrics rely on sophisticated routing algorithms to ensure efficient, deadlock-free communication.

The Routing Engine determines the path packets take through the fabric. The Subnet Manager (SM) calculates these paths based on the topology and configured algorithm.

Different topologies require specific routing engines:

Routing EngineTopology CompatibilityKey Features
Min-HopAnyShortest path; default algorithm; does NOT prevent credit loops.
UpDn (Up/Down)Fat-TreePrevents credit loops by enforcing hierarchical routing (no down-up-down paths).
Fat-TreeFat-TreeOptimized for symmetric Fat-Trees; balances traffic across spines.
Torus-2QoSTorus 2D/3DPrevents deadlocks; supports QoS levels; handles failures without credit loops.
Dragonfly+Dragonfly+Requires Adaptive Routing; uses non-minimal paths for load balancing.
  • Default: Used if no other engine is specified.
  • Method: Calculates the minimum number of hops between nodes.
  • Limitations: Prone to credit loops in complex topologies.

Designed for Fat-Tree topologies to prevent deadlocks.

  • Concept: Traffic must always go “Up” towards root switches (Spines) and then “Down” towards destination leaves. It forbids “Down-Up” turns (e.g., routing from one leaf to another through a lower-tier switch).
  • Root Nodes: Requires identifying the “Root” switches (Rank 0). The algorithm ranks switches based on distance from roots.
  • Configuration:
    1. Create /etc/opensm/root_guid.conf with the GUIDs of your root switches.
    2. Edit /etc/opensm/opensm.conf:
      root_guid_file /etc/opensm/root_guid.conf
      routing_engine updn
    3. Restart OpenSM (systemctl restart opensmd).

Optimized for Symmetric Fat-Trees.

  • Load Balancing: Unlike simple Min-Hop, it intentionally spreads traffic across different spine switches to avoid congestion.
  • Requirement: The topology must be symmetric (same number of uplinks/downlinks per switch at each tier).
  • Benefit: Ensures even distribution of traffic flows.

Adaptive Routing (AR) allows switches to dynamically select the best path for a packet based on real-time network conditions.

  • Static Routing: The SM calculates a single best path. Even if equal-cost paths exist, only one is used per connection.
  • Adaptive Routing: The SM identifies a group of ports with the same cost to a destination. The switch then chooses the least congested port for each packet.
  • Load Balancing: Utilizes multiple paths simultaneously, potentially doubling effective bandwidth.
  • Fault Tolerance: If a port fails, traffic is automatically rerouted to another port in the group without SM intervention.
  • Reduced Congestion: Avoids hotspots by steering traffic away from busy links.

Note: Topologies like Dragonfly+ require AR because they rely on non-minimal paths to achieve full bandwidth.

InfiniBand uses Credit-Based Flow Control to prevent buffer overflows and packet loss.

  • Credits: A sender can only transmit a packet if the receiver has granted enough “credits” (buffer space).
  • Per-VL: Credits are tracked independently for each Virtual Lane (VL).

A Credit Loop occurs when a cycle of dependencies forms in the network, where every switch is waiting for credits from its neighbor to send data. This causes a deadlock that can freeze the entire fabric.

Prevention:

  • Routing Algorithms: Engines like UpDn and Fat-Tree prevent loops by enforcing strict path rules (e.g., prohibiting certain turns).
  • Virtual Lanes: Dragonfly+ uses VL increments to avoid loops. Packets moving “Up” use one VL, and packets moving “Down” use another (e.g., VL0 -> VL1), breaking the cycle dependency.