DGX/HGX GPU Cluster Network Topologies: Fat-Tree, Spine-Leaf, and Dragonfly+ Compared

November 17, 2025

Introduction

Selecting the right network topology is one of the most critical decisions when designing GPU clusters for AI training. The topology determines bandwidth availability, latency characteristics, scalability limits, and total cost of ownership. This article provides an in-depth comparison of the three dominant topologies for DGX and HGX clusters: Fat-Tree, Spine-Leaf (CLOS), and Dragonfly+.

Topology Fundamentals

Network topology defines how switches and compute nodes are interconnected. For AI clusters, the ideal topology must provide:

High bisection bandwidth: Any half of the cluster can communicate with the other half at full speed
Low diameter: Minimum number of hops between any two nodes
Scalability: Ability to grow from hundreds to tens of thousands of nodes
Fault tolerance: Multiple paths between endpoints for redundancy
Cost efficiency: Optimal balance of performance and capital expenditure

Fat-Tree Topology

Architecture

Fat-Tree is a multi-rooted tree where bandwidth increases toward the core. A typical 3-tier Fat-Tree consists of:

Edge/Leaf Layer: Switches directly connected to GPU servers
Aggregation/Spine Layer: Intermediate switches connecting leaf switches
Core Layer: Top-tier switches providing inter-pod connectivity (for very large deployments)

In a pure Fat-Tree, every leaf switch connects to every spine switch, creating a non-blocking fabric with full bisection bandwidth.

Key Characteristics

Bisection Bandwidth: 100% (non-blocking)
Diameter: 4-6 hops (leaf → spine → spine → leaf)
Scalability: Up to 100,000+ endpoints with 3-tier design
Redundancy: N paths between any two servers (N = number of spine switches)

Advantages

Predictable, deterministic performance
Well-understood design patterns and operational practices
Full bisection bandwidth eliminates network bottlenecks
Excellent for all-to-all communication (gradient synchronization)

Disadvantages

High cable count: O(N²) cables for N switches
Expensive: requires many high-radix switches
Power consumption scales linearly with cluster size
Physical cabling complexity in large deployments

Best Use Cases

Clusters with 100-5,000 GPUs
Workloads requiring guaranteed bandwidth (LLM training)
Environments where predictability trumps cost

Spine-Leaf (CLOS) Topology

Spine-Leaf CLOS network topology diagram

Architecture

Spine-Leaf is a 2-tier CLOS fabric, a generalization of Fat-Tree optimized for data center deployments:

Leaf Layer: Top-of-Rack (ToR) switches connecting servers
Spine Layer: Aggregation switches providing inter-leaf connectivity

Every leaf connects to every spine, but unlike Fat-Tree, Spine-Leaf allows for asymmetric designs (e.g., different port counts, oversubscription ratios).

Key Characteristics

Bisection Bandwidth: 50-100% (configurable via oversubscription)
Diameter: 2 hops (leaf → spine → leaf)
Scalability: 10,000-100,000 endpoints
Flexibility: Supports tapered designs (2:1, 4:1 oversubscription)

Advantages

Lower latency than Fat-Tree (fewer hops)
Flexible oversubscription allows cost optimization
Industry-standard design with broad vendor support
Easier to scale incrementally (add spine switches as needed)

Disadvantages

Oversubscribed designs can create bottlenecks
Requires careful traffic engineering to avoid hotspots
Still requires significant cabling (though less than Fat-Tree)

Best Use Cases

General-purpose GPU clusters (mixed training/inference)
Deployments prioritizing cost-performance balance
Clusters with locality-aware workload placement

DGX SuperPOD Example

NVIDIA DGX SuperPOD reference architecture

NVIDIA's DGX SuperPOD uses a Spine-Leaf design with InfiniBand:

Leaf Switches: NVIDIA Quantum-2 QM8700 (64 ports @ 400Gbps)
Spine Switches: NVIDIA Quantum-2 QM9700 (64 ports @ 400Gbps)
Configuration: 20 DGX A100 systems per leaf, 8 uplinks per leaf to spine
Bisection Bandwidth: 25.6Tbps per SuperPOD (non-blocking)

Dragonfly+ Topology

Dragonfly+ network topology hierarchical structure

Architecture

Dragonfly+ is a hierarchical topology designed for extreme-scale systems (10,000+ nodes). It organizes nodes into groups with all-to-all intra-group connectivity and sparse inter-group links:

Intra-Group: All switches within a group are fully connected
Inter-Group: Each switch has links to switches in other groups
Hierarchical: Can be extended to multiple levels (groups of groups)

Key Characteristics

Bisection Bandwidth: 40-60% (lower than Fat-Tree, but sufficient for most workloads)
Diameter: 3 hops (local switch → global link → remote group → destination)
Scalability: 100,000+ endpoints with 2-level hierarchy
Cable Efficiency: O(N^1.5) vs. O(N²) for Fat-Tree

Advantages

Dramatically reduced cable count (50-70% fewer than Fat-Tree)
Lower cost per port at extreme scale
Excellent for workloads with locality (model parallelism within groups)
Lower power consumption due to fewer switches

Disadvantages

Complex routing algorithms required (adaptive routing essential)
Performance depends heavily on traffic patterns
Less predictable than Fat-Tree for all-to-all traffic
Requires sophisticated workload placement strategies

Best Use Cases

Extreme-scale clusters (10,000+ GPUs)
Workloads with strong locality (pipeline parallelism, federated learning)
Cost-sensitive deployments where 100% bisection bandwidth isn't required

Topology Comparison Table

Dimension	Fat-Tree	Spine-Leaf	Dragonfly+
Bisection BW	100%	50-100%	40-60%
Diameter	4-6 hops	2 hops	3 hops
Scalability	100K nodes	100K nodes	1M+ nodes
Cable Count	Very High	High	Medium
Cost (relative)	Highest	Medium	Lowest
Complexity	Low	Low	High
Predictability	Excellent	Good	Fair

Choosing the Right Topology

For Small-Medium Clusters (100-1,000 GPUs)

Recommendation: Spine-Leaf (2-tier CLOS)

Optimal balance of cost, performance, and simplicity
2-hop latency ideal for training workloads
Easy to deploy and operate

For Large Clusters (1,000-10,000 GPUs)

Recommendation: Fat-Tree or Spine-Leaf with minimal oversubscription

Full bisection bandwidth critical at this scale
Predictable performance justifies higher cost
Operational maturity of these topologies reduces risk

For Extreme-Scale Clusters (10,000+ GPUs)

Recommendation: Dragonfly+ or multi-tier CLOS

Cable reduction becomes critical at this scale
Workload placement strategies can mitigate lower bisection bandwidth
Cost savings of 30-50% vs. Fat-Tree

Hybrid Approaches

Many deployments use hybrid topologies:

Intra-Pod Fat-Tree + Inter-Pod Dragonfly: Full bandwidth within training pods, sparse connectivity between pods
Spine-Leaf with Rail Optimization: Separate fabrics for compute, storage, and management traffic
Hierarchical CLOS: Multiple spine layers for mega-scale deployments

Conclusion

There is no one-size-fits-all topology for GPU clusters. Fat-Tree and Spine-Leaf dominate the 100-10,000 GPU range due to their predictability and operational maturity. Dragonfly+ emerges as the cost-effective choice for extreme-scale deployments where workload locality can be exploited.

When selecting a topology, consider:

Cluster size and growth trajectory
Workload characteristics (all-to-all vs. localized communication)
Budget constraints (CapEx and OpEx)
Operational expertise and tooling

For most organizations deploying DGX or HGX clusters today, a 2-tier Spine-Leaf fabric with 400G/800G optics and 1:1 or 2:1 oversubscription represents the sweet spot of performance, cost, and operational simplicity.

Back to blog

Language

Language

Introduction

Topology Fundamentals

Fat-Tree Topology

Architecture

Key Characteristics

Advantages

Disadvantages

Best Use Cases

Spine-Leaf (CLOS) Topology

Architecture

Key Characteristics

Advantages

Disadvantages

Best Use Cases

DGX SuperPOD Example

Dragonfly+ Topology

Architecture

Key Characteristics

Advantages

Disadvantages

Best Use Cases

Topology Comparison Table

Choosing the Right Topology

For Small-Medium Clusters (100-1,000 GPUs)

For Large Clusters (1,000-10,000 GPUs)

For Extreme-Scale Clusters (10,000+ GPUs)

Hybrid Approaches

Conclusion

Subscribe to our emails