Fabric Topologies
Choosing the right topology is critical for performance, scalability, and cost-effectiveness in an InfiniBand fabric.
Leaf-Spine Architecture
Section titled “Leaf-Spine Architecture”The Leaf-Spine architecture is the most common foundation for InfiniBand fabrics.
- Leaf Switches: Connect directly to the compute nodes (servers).
- Spine Switches: Connect to all Leaf switches, aggregating traffic.
- Traffic Flow: Traffic between any two nodes on different leaves travels up to a spine and back down to the destination leaf.
Benefits:
- Predictable Latency: Deterministic hop count between nodes.
- Scalability: Easy to expand by adding more spines or leaves.
- Redundancy: All leaves connect to all spines; losing a spine only reduces bandwidth, not connectivity.
3-Level Spine-Leaf (Super Spine)
Section titled “3-Level Spine-Leaf (Super Spine)”For very large clusters, a third layer is added. Super Spines connect sets of Spine/Leaf groups, allowing the fabric to scale to thousands of nodes.
Fat-Tree Topology
Section titled “Fat-Tree Topology”A Fat-Tree is a topology where links closer to the top (root) of the tree are “fatter” (have more bandwidth) to prevent congestion. It is the standard implementation of Leaf-Spine in HPC.
Oversubscription
Section titled “Oversubscription”Fat-Tree performance is often defined by its Oversubscription Ratio—the ratio of downlink bandwidth (to servers) versus uplink bandwidth (to spines).
- 1:1 (Non-Blocking): For every 100Gb/s of bandwidth to servers, there is 100Gb/s of bandwidth to spines. Ensures full line-rate performance for all nodes simultaneously.
- 2:1 (Blocking): For every 200Gb/s to servers, there is only 100Gb/s to spines. End nodes may not achieve full bandwidth if everyone transmits at once, but latency remains low.
Advantages:
- Efficient for high-performance computing.
- Lowest and most deterministic latency.
- Scalable via multiple layers (Leaf -> Spine -> Super Spine).
Dragonfly+ Topology
Section titled “Dragonfly+ Topology”Dragonfly+ connects groups of compute nodes in a highly scalable, cost-effective manner.
- Groups: Inside a group, nodes are connected in a full bipartite (Leaf-Spine) topology.
- Inter-Group: Groups are connected to each other in a full mesh (all-to-all).
Requirement: Dragonfly+ requires Adaptive Routing to function efficiently due to the multiple path options between groups.
Advantages:
- Supports a larger number of hosts than Fat-Tree for the same switch count.
- More cost-effective (fewer cables/switches) for large scales.
- High bandwidth and low latency.
Torus 3D Topology
Section titled “Torus 3D Topology”In a 3D Torus, nodes are connected in a ring formation across three dimensions (x, y, z).
- Connections: Each switch connects to its 6 neighbors (2 in x, 2 in y, 2 in z).
- Resilience: If a link breaks, traffic can wrap around the ring in the other direction.
Advantages:
- Cost-Effective: Simple, short cabling. Ideal for massive installations (supercomputers).
- Fault Tolerance: Highly resilient due to multiple paths.
- Locality: Excellent for applications where communication is localized to neighbors.
Disadvantages:
- Higher hop count for distant nodes (higher latency).
- Typically has a higher oversubscription ratio.