Review: Understanding Lifecycle Management Complexity of Datacenter Topologies
Introduction
When building large-scale data center networks, performance metrics like low latency and high throughput are important, but complexity is another critical factor often overlooked. Complexity impacts the deployment, maintenance, and scalability of the network over its lifecycle. This post discusses new metrics proposed for quantifying data center network complexity and introduces FatClique, a novel topology designed to reduce complexity while maintaining performance.
Measuring Deployment Complexity
Deploying datacenter networks involves packaging switches, placing equipment, and bundling/connecting cabling. Three key metrics capture the deployment complexity:
- Number of Switches — More switches increase packaging and placement effort.
2. Number of Patch Panels — Patch panels enable bundling cables but require more panels for inter-rack connections.
3. Number of Bundle Types — Fewer unique cable bundle lengths/capacities simplify cabling.
Measuring Expansion Complexity
As demand grows, networks must expand gradually while minimizing disruption and maintaining capacity. Two metrics gauge the complexity:
- Number of Expansion Steps — Each step requires link rewiring that cannot be fully parallelized. Fewer steps are better.
2. Rewired Links per Patch Panel Rack per Step — With parallelism, this dominates the expansion time for each step.
Comparing Existing Topologies
The paper uses the proposed metrics to evaluate traditional datacenter network topologies like multi-layer Clos and expander graph designs (Jellyfish, Xpander). Clos networks have fewer bundle types but require more patch panels, while expanders need more patch panels for their higher inter-rack connectivity.
For expansion, networks with a higher “north-south capacity ratio” (oversubscribed uplinks) can rewire more links per step while maintaining enough residual capacity. This “FatEdge” property enables faster, less disruptive expansions.
Introducing FatClique
The FatClique topology incorporates hierarchy, structure, and fat edges to reduce complexity based on the defined metrics:
- Three hierarchy levels: sub-blocks, blocks, and the overall FatClique
- Clique interconnects at each level for high connectivity
- Constrained to have oversubscribed uplinks (fat edge) at every level
- Structured design enables bundling and reduces patch panels
Evaluation Results
Compared to Clos and expander topologies across different scales:
- FatClique uses 50% fewer switches and 33% fewer patch panels at large scale
- 23–36% lower cabling costs due to reduced switches/cables
- Enables faster expansions at higher capacity SLOs due to the fat edge design
The structured, oversubscribed hierarchy of FatClique allows high performance while drastically reducing deployment and expansion complexity for large datacenter networks.
Conclusion
While performance is crucial, this paper highlights the importance of also optimizing for lifecycle management complexity in massive datacenter networks. The proposed metrics quantify complexity, exposing the drawbacks of traditional topologies. FatClique demonstrates that intelligent structured design can achieve superior performance and manageability. Designing for simplicity and manageability will become even more critical as networks continue scaling.