Physical Layer
Overview
Section titled “Overview”The physical layer in InfiniBand handles the physical connection between devices. Its primary responsibilities include:
- Bit Synchronization: ensuring the receiver can correctly interpret the incoming bit stream.
- Bit Rate Control: Maintaining the appropriate data transmission speed.
- Physical Topologies: Supporting various network configurations.
- Signal Integrity: Guaranteeing signal quality to meet Bit Error Rate (BER) requirements.
Specifically, the physical layer manages:
- Establishing a physical link when possible.
- Informing the Link Layer if the link is up or down.
- Monitoring the status of physical connections.
Packet on the Wire
Section titled “Packet on the Wire”At the physical layer, the data packet is encapsulated with start/end delimiters and separated by idles.
graph LR
%%{init: {'theme': 'base', 'themeVariables': { 'mainBkg': '#f3f4f6', 'nodeBorder': '#374151', 'textColor': '#000000', 'lineColor': '#374151'}}}%%
A[Start Delimiter] --> B[Data Symbols]
B --> C[End Delimiter]
C --> D[Idles]
style A fill:#f3f4f6,stroke:#374151,color:#000000
style B fill:#dbeafe,stroke:#2563eb,color:#000000
style C fill:#f3f4f6,stroke:#374151,color:#000000
style D fill:#f3f4f6,stroke:#374151,color:#000000
Host Channel Adapters (HCAs)
Section titled “Host Channel Adapters (HCAs)”Host Channel Adapters (HCAs) are the physical network cards installed in servers to connect them to the InfiniBand fabric. Nvidia (formerly Mellanox) is the primary vendor for these cards (e.g., ConnectX series).
RDMA Offload
Section titled “RDMA Offload”HCAs are designed to offload significant network processing from the host CPU. A key feature is Remote Direct Memory Access (RDMA), which allows the HCA to read and write directly to application memory, bypassing the operating system kernel. This results in ultra-low latency and high throughput.
GUID Assignment
Section titled “GUID Assignment”Every HCA has a globally unique identifier (GUID) assigned by the manufacturer:
- Single Port HCA: Uses the base GUID assigned to the device.
- Multi-Port HCA:
- Port 1 uses the Base GUID.
- Subsequent ports use Base GUID + Port Number (e.g., Port 2 = Base GUID + 1).
Link Widths & Rates
Section titled “Link Widths & Rates”InfiniBand links are composed of multiple lanes. Each “x” represents a lane, which consists of one differential pair for transmitting (TX) and one differential pair for receiving (RX).
- 1x: 1 Lane (2 pairs of wires)
- 4x: 4 Lanes (8 pairs of wires)
- 8x: 8 Lanes (16 pairs of wires)
- 12x: 12 Lanes (24 pairs of wires)
Link Rate Calculation
Section titled “Link Rate Calculation”The total link rate is determined by the speed of the individual lane multiplied by the number of lanes (Link Width).
Link Speed × Link Width = Total Link Rate
Common bandwidth generations include:
- EDR: 25Gb/s per lane
- HDR: 50Gb/s per lane
- NDR: 100Gb/s per lane
For example, an NDR 4x link provides 400Gb/s of total bandwidth.
Cables & Connectors
Section titled “Cables & Connectors”Copper Direct Attach Cables (DAC)
Section titled “Copper Direct Attach Cables (DAC)”Copper DACs use shielded copper pairs to transmit signals.
- Conductors: Each lane (RX or TX) uses one differential pair of conductors. This means a 4x link (4 physical lanes) utilizes 16 pairs of conductors (4 lanes × 2 directions × 2 wires per pair? Note: A standard 4x connector has 4 TX pairs and 4 RX pairs, totaling 8 differential pairs.)
- Pros: Cheaper than fiber for short distances.
- Cons: Susceptible to Electromagnetic Interference (EMI), can short/wear down over time (potential fire/arc risk), bulkier (4x width of fiber).
DAC Media Types
Section titled “DAC Media Types”Passive DAC cables require no port power but have the shortest reach.
| Data Rate | Connector | Max Reach | Port Power |
|---|---|---|---|
| QDR (40G) | QSFP | 7m | None |
| FDR (56G) | QSFP | 5m | None |
| EDR (100G) | QSFP28 | 5m | None |
| HDR (200G) | QSFP56 | 2m | None |
| NDR (400G) | OSFP (twin finned) / OSFP (flat) / QSFP112 | ~1.5m | None |
Active Copper Cables (ACC)
Section titled “Active Copper Cables (ACC)”ACC cables use signal boosting electronics in the connector to extend reach beyond passive DAC, at the cost of drawing port power. They are thinner than DAC cables.
| Data Rate | Connector | Max Reach | Port Power |
|---|---|---|---|
| HDR (200G) | QSFP56 | 4m | ~3.5W per port |
| NDR (400G) | OSFP / QSFP112 | ~3m | ~1.5W per connector |
Active Optical Cables (AOC)
Section titled “Active Optical Cables (AOC)”AOCs use fiber optic cables with transceivers attached at each end.
- Structure: 1 physical fiber strand carries one unidirectional lane (RX or TX). A full bi-directional lane requires 2 fibers.
- Media: Typically uses Multi-Mode fiber (cheaper than Single-Mode).
- Pros: immune to EMI, safer (no electrical short risk), smaller, easier to manage, durable jacketing.
- Cons: More expensive due to laser systems.
AOC Media Types
Section titled “AOC Media Types”| Data Rate | Form Factor | Max Reach |
|---|---|---|
| EDR (100G) | QSFP28 | 100m |
| HDR (200G) | QSFP56 | 150m |
Physical Link Status
Section titled “Physical Link Status”You can monitor the state of a physical link using tools like ibstat or ibportstate. Common statuses include:
- Polling: No cable is connected, or the link has not yet been established.
- Disabled: The port is administratively disabled. Use
ibportstateto enable it. - PortConfigurationTraining: The link is in the process of being configured; usually a transient state.
- LinkUp: The port is connected and the link is established.
- LinkError Recovery: The link has encountered errors and is attempting to recover. This often indicates a bad cable that should be replaced.
GUID Types
Section titled “GUID Types”- System Image GUID: Identifies the system as a whole (e.g., a chassis).
- Node GUID: Identifies a specific node (e.g., an HCA card or a switch blade).
- Port GUID: Identifies a specific port on a node.
Note: In multi-chassis hosts, there is a single System Image GUID, while each blade has its own Node GUID.
Troubleshooting Utilities
Section titled “Troubleshooting Utilities”Common OFED utilities for troubleshooting physical connections:
ibstat: Display status of the local HCA ports.ibportstate: Display and modify the status of a port (enable/disable, reset).ibswitches: Show all switches discovered in the fabric.ibhosts: Show all HCAs (hosts) in the subnet and display their GUIDs.ibnodes: Shows bothibhostsandibswitches.
Further Reading
Section titled “Further Reading”- Cabling Data Centers Design Guide — NVIDIA guide covering InfiniBand and Ethernet cable types, connectors, NDR cabling, deployment planning, and best practices.
- SFP+, SFP28, QSFP+, QSFP28, QSFP56, QSFP-DD, QSFP112 vs OSFP — Comparison of transceiver form factors, data rates, and connector types.