Skip to content

IB Diagnostics

InfiniBand provides a rich set of diagnostic tools for troubleshooting at both the host level and across the entire fabric.

These tools are used to inspect the local node’s configuration and connectivity.

CommandDescription
ofed_infoCheck the DOCA/OFED driver version
lspciCheck the type and version of installed HCAs
ibstatDisplay the link status of a node in the IB fabric
ibportstate <lid> <port>Display the link status of a specific port on a node
ibroute <lid>Display the forwarding table of the switch with a specific LID
ibv_devicesList InfiniBand devices (HCAs)
ibv_devinfoDisplay detailed information about InfiniBand devices (HCAs)

These tools operate across the fabric and are used for broader discovery and troubleshooting.

CommandDescription
ibswitchesIdentify all switches in the IB fabric
ibhostsIdentify all HCAs in the IB fabric
ibnodesIdentify all nodes in the IB fabric
ibnetdiscoverDisplay node-to-node connectivity
iblinkinfoList all nodes and connectivity information
sminfoIdentify the master Subnet Manager
ibpingPing-pong test over IB to validate connectivity between hosts
ibtracert <src-lid> <dst-lid>Display the route between two nodes
ibdiagnetComprehensive fabric health diagnostics
ib_write_latMeasure RDMA Write latency between two nodes
ib_read_latMeasure RDMA Read latency between two nodes
ib_write_bwMeasure RDMA Write bandwidth between two nodes
ib_read_bwMeasure RDMA Read bandwidth between two nodes

ibdiagnet is the primary troubleshooting tool for fabric discovery, error detection, and general diagnostics. It is part of the ibutils2 package included in DOCA/OFED and UFM software packages.

It works by scanning the fabric using directed-route packets and extracting information about fabric connectivity and devices.

ibdiagnet performs the following checks:

  • Fabric Discovery — Sweeps the IB fabric and collects information from switches, HCAs, routers, aggregation nodes, and gateways.
  • Duplicated GUIDs — Reports duplicated node and port GUIDs.
  • Duplicated Node Descriptions — Warns about duplicated node descriptions for switches and HCAs.
  • LIDs Check — Validates LID assignment and checks for duplicated LIDs.
  • Links in Init/Unresponsive States — Reports links in INIT logical state and unresponsive devices, including the direct route to reach them.
  • Counters Fetch — Fetches various counters from IB devices including standard/extended port counters, diagnostic counters, and physical counters.
  • Error Counters Checks — Checks error counters crossing thresholds between counter snapshots.
  • Routing Fetch and Checks — Validates switch routing tables and checks for credit-loop free routing.
  • Link Width and Speed Checks — Verifies that fabric links are operating at maximum supported speed and width.
  • Topology Matching — Compares the live topology with a previously stored one.
  • Partition Checks — Dumps and validates HCA and switch partition tables.
  • BER Test — Reports links with high Bit Error Rates.

Running ibdiagnet without any flags performs:

  • Fabric Discovery
  • Duplicated GUIDs Check
  • Duplicated Node Description Check
  • LID Check
  • Links Check
  • Subnet Managers Check
  • Port Counters Snapshot/Checks (1 second period)
  • Nodes Information Check (uniform firmware versions)
  • Speed/Width Check
  • Alias GUIDs
  • Dump Virtualization Information
  • Partition Keys Checks
  • Dump Temperature Sensing
  • Create Network Dump file (ibnetdiscover format)
Terminal window
ibdiagnet

If there are multiple HCAs, ibdiagnet runs on the first active interface. To select a specific HCA and port:

Terminal window
ibdiagnet --i <hca-name> --p <port-num>

Check the version:

Terminal window
ibdiagnet --version

The standard output of ibdiagnet groups results by check, separated by dashes:

--------------------------------
Discovery
--------------------------------
Lids Check
--------------------------------
Links Check
--------------------------------
Subnet Manager
--------------------------------

ibdiagnet writes detailed results to several files in /var/tmp/ibdiagnet2/ by default. The output path can be changed with -o or --output_path.

FileContents
ibdiagnet2.logLog file for the ibdiagnet run
ibdiagnet2.lstFabric links in LST format
ibdiagnet2.net_dumpFabric link dump including split cable mapping and FEC info
ibdiagnet2.smSubnet managers (list of all SMs, state, priority)
ibdiagnet2.pmIB spec compliant port counters
ibdiagnet2.fdbsUnicast Forwarding Tables
ibdiagnet2.arAdaptive routing tables
ibdiagnet2.nodes_infoNodes information
ibdiagnet2.pkeyPkey tables (all partitions and member host GUIDs)
ibdiagnet2.slvlSLVL tables of fabric switches
ibdiagnet2.ibnetdiscoverDiscovered network in ibnetdiscover format

Adjust the delta used when comparing port counters (default is 1 second):

Terminal window
ibdiagnet --pm_pause_time 60

Generate a topology file:

Terminal window
ibdiagnet -w /var/tmp/ibdiagnet2/topology