Skyway Gateway
The NVIDIA Skyway Gateway is an appliance that bridges InfiniBand (IB) fabrics to Ethernet networks, enabling IP-over-InfiniBand (IPoIB) connectivity between IB hosts and external Ethernet resources.
When Do You Actually Need Skyway?
Section titled “When Do You Actually Need Skyway?”In most modern AI clusters, hosts are multi-homed — they have both InfiniBand and Ethernet interfaces. IB handles GPU-to-GPU RDMA traffic while Ethernet provides management, storage, and external connectivity. In these environments, Skyway is unnecessary.
Skyway becomes relevant when physical constraints limit the available network infrastructure to InfiniBand only — for example, when rack space, cabling, or switch budget prevents deploying a parallel Ethernet fabric, but hosts still need IP connectivity to external services.
Overview
Section titled “Overview”Skyway Gateway operates as both an InfiniBand host on the fabric and an IP router. It facilitates communication by routing traffic from IB HCAs to Ethernet networks using the standard IPoIB protocol.
- Protocol Support: Supports IPv4 addresses only.
- Function: Acts as a bridge/gateway for IB hosts to access Ethernet services (storage, management, external connectivity).
Architecture
Section titled “Architecture”The Skyway appliance is built on an x86 server platform equipped with multiple InfiniBand Host Channel Adapters (HCAs) and Ethernet ConnectX adapters.
Hardware Configuration
Section titled “Hardware Configuration”- Port Pairing: The IB HCA and Ethernet HCA ports are essentially bridged in pairs. Traffic entering a specific IB port is forwarded out its corresponding Ethernet port.
- Throughput: High-performance implementations utilize 8x HDR InfiniBand ports and 8x 200GbE ports (ConnectX-6) to deliver up to 1.6 Tb/s of throughput per appliance.
- Scalability: Multiple appliances (e.g., up to 4) can be deployed in a single Skyway domain to scale total throughput.
Virtualization & Addressing
Section titled “Virtualization & Addressing”Skyway leverages SR-IOV (Single Root I/O Virtualization) to virtualize physical network resources.
- Virtual Functions (VFs): Each physical IB port is divided into multiple Virtual Functions.
- Addressing: Each VF is assigned its own unique Virtual GUID (V-GUID), Virtual GID (V-GID), and Virtual LID (V-LID).
- Gateway Redundancy: The gateway configures its IB ports into a port-channel, assigning a single IP address to this logical interface. This IP acts as the default gateway for IPoIB-enabled hosts within the IB fabric.
Call Flow
Section titled “Call Flow”The communication process involves address resolution (ARP) and path queries to the Subnet Manager (SM) to establish connections.
InfiniBand to Ethernet
Section titled “InfiniBand to Ethernet”- Distribution: 64 VFs (V-GUIDs) are distributed across the InfiniBand ports on the gateway appliance.
- ARP Request: When an IB host needs to communicate with an Ethernet destination, it sends an ARP request for its default gateway (the Skyway IP). This broadcast is handled by the default HCA receiver on the Skyway appliance (typically
ib0). - Load Balancing: The Skyway kernel processes the ARP request. It selects a specific VF (and its V-GUID) to handle the traffic, load-balancing based on the source IP of the requesting host.
- Path Resolution: The IB host receives the V-GUID of the assigned gateway port. It then sends a Path Query to the Subnet Manager (SM) to resolve this V-GUID to a LID.
- SM Response: The Subnet Manager responds with the LID of the Skyway Gateway port.
- Data Transmission: The host encapsulates the IP packet into an InfiniBand packet and sends it to the gateway’s LID.
- Forwarding: The packet arrives at the specific IB port on the gateway. Hardware forwarding strips the IB headers and moves the IP packet to the paired Ethernet port.
- Egress: The Ethernet port routes the packet to the next hop or destination using its routing table.
Ethernet to InfiniBand
Section titled “Ethernet to InfiniBand”- Ingress: An external source sends an Ethernet packet destined for the IP address of an IB host.
- Routing: The external network routes the packet to the Skyway Ethernet port-channel.
- Internal Forwarding: The specific Ethernet port receiving the traffic forwards it to its paired IB port.
- ARP & Discovery: If the destination IB host’s MAC/GUID is not cached, the gateway sends an ARP request into the IB fabric.
- Host Response: The target IB host responds with its GUID.
- Path Query: The gateway queries the Subnet Manager to determine the LID for that GUID.
- SM Response: The Subnet Manager provides the destination LID.
- Encapsulation: The gateway encapsulates the Ethernet payload into an IPoIB packet and sends it to the IB host’s LID.
- Processing: The IB host receives and processes the packet.
Basic Deployment
Section titled “Basic Deployment”A basic Skyway deployment involves configuring the IB port-channel, the Ethernet port-channel, the Subnet Manager for virtualization, and the IB hosts.
Subnet Manager Configuration
Section titled “Subnet Manager Configuration”The Subnet Manager must have virtualization enabled for Skyway’s SR-IOV to function.
OpenSM — Add the following to opensm.conf:
virt_enabled 2virt_max_ports_in_process 0Managed IB Switch — If using the embedded SM on a managed switch:
configure terminalib sm virt enableib sm virt-max-ports-in-progress 0write memoryInfiniBand Side (IPoIB Port-Channel)
Section titled “InfiniBand Side (IPoIB Port-Channel)”Skyway uses a Cisco-like CLI (similar to MLNX-OS on NVIDIA IB switches). Configure the IB port-channel with a source IP and a virtual IP that IB hosts will use as their default gateway.
enableconfigure terminalinterface ib port-channel 1 ip address 192.168.0.254/24interface ib port-channel 1 virtual ip address 192.168.0.1/24interface ib port-channel 1 mtu 4092write memoryThe MTU must be set to 4092 or lower. InfiniBand’s maximum MTU is 4094, and the encapsulated IP payload needs to be smaller to avoid forwarding issues.
IB Host Configuration
Section titled “IB Host Configuration”Configure each IB host with an IP on the IPoIB subnet and a default route pointing to the Skyway virtual IP.
ifconfig ib0 192.168.0.10/24ip route add 0/0 via 192.168.0.1For production, make these settings persistent using netplan or equivalent.
Ethernet Side
Section titled “Ethernet Side”Configure the Ethernet port-channel on the Skyway appliance. LACP must be set to active mode on the remote Ethernet switch.
enableconfigure terminalinterface ethernet port-channel 1 ip address 192.168.1.2/30interface ethernet port-channel 1 mtu 4090ip route 0.0.0.0/0 192.168.1.1write memoryThe Ethernet MTU should be lower than the IB port-channel MTU.
Remote Ethernet Switch Example
Section titled “Remote Ethernet Switch Example”An example configuration for the Ethernet switch connected to the Skyway appliance:
enableconfigure terminalinterface port-channel 1no shutexitinterface ethernet 0/1-0/8channel-group 1 mode activeno shutexitvlan 10exitinterface port-channel 1 switchport access vlan 10interface port-channel 1 switchport mode accessinterface vlan 10 ip address 192.168.1.1/24ip route 192.168.0.0/24 192.168.1.254endwr memThe static route 192.168.0.0/24 points to the Skyway Ethernet IP as the next hop for the IPoIB subnet.
High Availability
Section titled “High Availability”Skyway supports HA deployments with up to 4 appliances in a single gateway domain. All appliances share a common LACP port-channel across the Ethernet side, similar to a VPC/MLAG design.
Domain Roles
Section titled “Domain Roles”Each appliance in the domain holds one of three roles:
- Master Gateway — Only one per domain. Responsible for V-GUID assignment, load balancing, and overall domain coordination.
- Active Backup Gateway(s) — Actively forwarding traffic and ready to assume the master role.
- Non-Active Backup Gateway(s) — Standing by for failover.
Each domain member distributes its IB host list to all other members in the domain.
Prerequisites
Section titled “Prerequisites”- MLNX-GW OS version must be identical across all appliances.
- All appliances must share the same L2 management subnet.
- All appliances must use the same HA domain ID.
- All Skyway Ethernet interfaces must be connected to L3 router interfaces.
- Virtual IP and Ethernet port-channel configuration must be identical on all appliances.
- Even single-appliance deployments should include HA configuration to simplify future scale-out.
HA Configuration
Section titled “HA Configuration”On the master appliance, set a higher priority to ensure election as master:
gw ha 1gw ha priority 100On all other appliances, join the same domain:
gw ha 1All appliances require a reboot after HA configuration. Validate the setup with:
show gw haRemote Ethernet Switch (Multi-Appliance)
Section titled “Remote Ethernet Switch (Multi-Appliance)”When multiple Skyway appliances are in the same domain, all their Ethernet ports join a single port-channel on the remote switch. For example, with two appliances (ports 0/1-0/8 for appliance 1, 0/9-0/16 for appliance 2):
enableconfigure terminalinterface port-channel 1no shutexitinterface ethernet 0/1-0/16channel-group 1 mode activeno shutexitvlan 10exitinterface port-channel 1 switchport access vlan 10interface port-channel 1 switchport mode accessinterface vlan 10 ip address 192.168.1.1/24ip route 192.168.0.0/24 192.168.1.254endwr memFailover Behavior
Section titled “Failover Behavior”If a gateway appliance fails (hardware failure, cabling issue, or port configuration change), its V-GUIDs are automatically reassigned to other HCAs in the domain. IB hosts see no disruption — from their perspective, the gateway remains reachable.
Multi-Tenant Support (Multiple P_Keys)
Section titled “Multi-Tenant Support (Multiple P_Keys)”Skyway supports multiple IPoIB subnets per partition key (P_Key), enabling multi-tenant environments where each partition operates as an isolated network segment (similar to VLANs).
- Each P_Key gets its own IPoIB subnet, but all share the same Skyway domain.
- A single domain supports up to 20 IPoIB subnets (best practice recommends 10 or fewer to avoid longer boot times).
- P_Key interfaces only support IPv4.
- All fabric nodes are connected to the management P_Key (
0x7FFF) by default.
Gateway Configuration
Section titled “Gateway Configuration”To configure a P_Key-specific IPoIB subnet on the Skyway IB port-channel (using P_Key 0x1 as an example):
configure terminalinterface ib port-channel 1 pkey 0x1interface ib port-channel 1 pkey 0x1 ip address 192.168.0.254 255.255.255.0interface ib port-channel 1 pkey 0x1 virtual ip address 192.168.0.1 255.255.255.0write memoryHost Configuration
Section titled “Host Configuration”On the IB host, create an interface for the P_Key using the format ib0.8<pkey>:
ifconfig ib0.8001 192.168.0.10/24ip route add 0/0 via 192.168.0.1Verification
Section titled “Verification”View all P_Key interfaces on the Skyway appliance:
show interfaces ib port-channel 1 pkey briefTo display SM-configured P_Keys, check partitions.conf on the Subnet Manager. Running ifconfig on a host will show its configured P_Key interface.
Operational Commands
Section titled “Operational Commands”Host Validation
Section titled “Host Validation”| Command | Purpose |
|---|---|
ifconfig / ip addr | Verify IPoIB interface IP configuration |
route / ip route | Verify default route points to Skyway virtual IP |
ping / ibping | Test connectivity |
ibstat | Check local HCA link status |
ibswitches | List switches in the fabric |
iblinkinfo | Show link connectivity details |
Gateway Commands
Section titled “Gateway Commands”| Command | Purpose |
|---|---|
show interfaces ib | Display all IB interfaces |
show interfaces ib port-channel | Display IB port-channel status |
show interfaces eth port-channel | Display Ethernet port-channel status |
show gw vf-distribution | Show VF-to-HCA port assignments |
show gw ha | Display HA domain status and roles |
show asic version | Display HCA firmware versions |
show images | List available OS images |
show version | Show installed OS version |
Validation with ibdiagnet
Section titled “Validation with ibdiagnet”ibdiagnet can be used to validate Virtual Functions. Compare the output file ibdiagnet2.vports with the Skyway command show gw vf-distribution to confirm VF assignments match.
Virtualization settings are stored in opensm.conf on the Subnet Manager.