Skip to content

Subnet Manager

The Subnet Manager (SM) is the brain of an InfiniBand network. It is a centralized software entity responsible for discovering, configuring, and managing the fabric. Without an active SM, an InfiniBand network cannot function—even for a simple point-to-point connection between two servers.

The Subnet Manager performs several critical tasks to bring the network up and keep it running efficiently:

  1. Network Discovery (Sweep)

    • The SM scans the network to discover all active devices (switches, HCAs, routers).
    • It identifies the topology, including how switches and nodes are interconnected.
  2. Addressing (LID Assignment)

    • Assigns a unique Local Identifier (LID) to every port on the subnet.
    • LIDs are 16-bit addresses used for local routing within the subnet (similar to MAC addresses in Ethernet, but assigned dynamically).
  3. Routing Calculation & Deployment

    • Calculates the most efficient paths for traffic between all pairs of nodes based on the topology.
    • Supports various routing algorithms (e.g., MinHop, Up/Down, Fat-Tree, Torus-2QoS).
    • Programs the Linear Forwarding Tables (LFTs) on every switch, instructing them where to forward packets based on destination LIDs.
  4. Traffic Isolation (Partitioning)

    • Manages Partitions using Partition Keys (P_Keys).
    • Ensures that nodes can only communicate with other nodes that share the same P_Key (similar to VLANs in Ethernet).
  5. Quality of Service (QoS)

    • Configures Service Levels (SLs) and maps them to Virtual Lanes (VLs).
    • Sets up arbitration on switches to prioritize critical traffic (e.g., low-latency MPI traffic over bulk storage traffic).
  6. Fault Management

    • Continuously monitors the fabric for changes (link failures, new devices).
    • Detects topology changes (traps) and triggers a Heavy Sweep to re-discover and re-route the fabric if necessary.
  • Subnet Management Agent (SMA): Every InfiniBand device (switch or HCA) runs a small agent called the SMA. The centralized SM communicates with these agents via Subnet Management Packets (SMPs) to get status updates and push configurations.
  • Master vs. Standby:
    • A subnet can have multiple SMs running for redundancy, but only one is the Master SM.
    • All others are Standby SMs. They monitor the Master and synchronize their state.
    • If the Master fails, an election occurs, and a Standby promotes itself to Master.

The Subnet Manager continuously monitors the fabric using two types of “sweeps” to ensure the topology is up-to-date.

A Light Sweep is a low-impact check performed periodically to detect status changes without disrupting the fabric.

  • Frequency: Occurs automatically at a set interval (default is every 10 seconds).
  • What it does:
    • Queries the status of ports and nodes.
    • Checks for changes in SM priority or the presence of new SMs.
  • Outcome: If the Light Sweep detects any significant change (e.g., a port that was down is now active), it immediately triggers a Heavy Sweep.

A Heavy Sweep is a comprehensive discovery and configuration process. It is more resource-intensive and can momentarily impact fabric traffic.

  • When it happens:
    • Triggered by a Light Sweep finding a change.
    • Triggered by a Trap: If a switch or HCA reports a critical event (like a link going up or down), it sends a trap to the SM, causing an immediate Heavy Sweep.
    • Manual Trigger: Can be forced by an administrator (e.g., restarting OpenSM).
  • What it does:
    1. Full Discovery: Rediscovers the entire network topology.
    2. LID Assignment: Assigns new LIDs to any newly discovered devices.
    3. Routing Calculation: Recalculates the routing tables (LFTs) for the entire fabric to handle the new topology or route around failures.
    4. Reprogramming: Pushes the new forwarding tables to all switches.
  • Impact: During the reprogramming phase, traffic on affected routes may experience a brief pause or latency.

The SM can use different algorithms depending on the network topology:

  • MinHop: Finds the path with the fewest number of hops. Good for irregular topologies but can cause congestion.
  • Up/Down: Prevents routing loops in irregular networks by enforcing a hierarchy (root nodes vs. leaf nodes).
  • Fat-Tree: Optimized for Fat-Tree topologies (common in HPC). Ensures contention-free routing and full bisection bandwidth.
  • Dimensional Order Routing (DOR): Used for Grid/Mesh/Torus topologies (e.g., Hypercube).

The Subnet Manager can run in three locations. The choice should be based on fabric scale, feature requirements, and licensing budget.

DeploymentScaleAdaptive RoutingDragonfly+License Required
Managed Switch (Embedded SM)Up to 2,048 nodesNoNoNo
Server (OpenSM)Medium–LargeYesYesNo
UFMMedium–LargeYesYesYes (per device)

Nvidia Managed InfiniBand switches run the MLNX-OS operating system, which includes an embedded Subnet Manager. This is the simplest option for smaller fabrics.

  • Scale: Up to 2,048 nodes.
  • Limitations: Does not support Adaptive Routing or the Dragonfly+ routing engine.
  • Management: Managed switches support both in-band (via IB fabric) and out-of-band (via an RJ45 Ethernet management port with a separate IP address) management. Unmanaged switches only support in-band management (no CPU or management capability).

SSH into the managed switch and use the CLI (Cisco-like shell):

Terminal window
enable
configure terminal
# Check SM status (should be disabled by default)
show ib sm
# Enable the SM
ib sm
# Verify it is running
show ib sm
# Save the configuration
end
wr mem

It is recommended to set an explicit priority (especially when running multiple SMs):

Terminal window
enable
configure terminal
ib sm sm-priority 14
# Verify
show ib sm sm-priority
Terminal window
# List available routing engines
ib sm routing-engines ?
# Set the routing engine (e.g., UpDn)
ib sm routing-engines updn

Use ib sm ? to see all available SM configuration options.

For medium-to-large fabrics, OpenSM can be run on a server with the Nvidia DOCA-OFED stack installed.

  • Scale: Defaults support up to ~200 nodes; can be tuned for larger fabrics.
  • Features: Supports Adaptive Routing and the Dragonfly+ routing engine.
  • License: None required.

OpenSM should be run as a daemon (system service).

Terminal window
# Run OpenSM with defaults
opensm
# View help
opensm -h

OpenSM logs to two locations:

  • /var/log/messages: General major events.
  • /var/log/opensm.log: Detailed information and errors. All errors should be treated as indicators of fabric health.

The OpenSM configuration file is typically stored at /etc/opensm/opensm.conf.

To generate a default configuration file:

Terminal window
opensm -c /etc/opensm/opensm.conf

Edit the configuration file to set the routing engine:

routing_engine ar_updn

Multiple engines can be specified as fallbacks (space-separated). OpenSM will try each in order, eventually falling back to Min-Hop if all others fail:

routing_engine ar_updn updn

As of OpenSM 5.10 (November 2021), the default routing engine was changed from Min-Hop to UpDn with Adaptive Routing (ar_updn).

After making changes, restart the service:

Terminal window
systemctl restart opensmd

Unified Fabric Manager (UFM) is a licensed Nvidia product for medium-to-large scale fabrics. It provides a WebUI-based platform for comprehensive fabric management.

  • Core: Uses OpenSM under the hood but adds enhanced diagnostics, telemetry, and automation.
  • Deployment: Can run as a daemon, Docker container, or on a dedicated Nvidia hardware appliance.
  • License: Per managed device.
PlatformDescription
UFM TelemetryNetwork telemetry, application workload usage, and system configuration.
UFM EnterpriseAdds automated network discovery/provisioning, traffic monitoring, congestion detection, job scheduler integration (Slurm, LSF), and cloud manager integration (OpenStack, Azure, VMware).
UFM Cyber-AIAdds preventive maintenance and cybersecurity analytics for reducing supercomputing operational costs.
  1. Navigate to Main Navigation > Settings.
  2. Select the Subnet Manager tab.
  3. Configure settings under the relevant sub-tabs.

To configure a routing engine:

Settings > Network Management > Routing Engine