Paper Review: Achieving High Utilization with Software-Driven WAN

ztex, Tony, Liu
5 min readMar 15, 2024

--

解決思路: Demand Shifting

Background: Multi-protocol Label Switching (MPLS)

Label-switching

  • fast forwarding on the label in the packet header
  • header between layer 2 and layer 3 headers
  • also has a class-of-service field for priority

MPLS Forwarding

  • Label-switched routers (LSRs)
  • Ingress LSR inserts labels in the MPLS header.
  • All LSRs forward on labels may rewrite labels.
  • Egress LSR removes labels.

Benefit: We can use label forwarding to do traffic engineering

不用 TE -> 不能走 longer path -> 需要 tunneling (Google: IP-in-IP tunneling, Microsoft: MPLS tunneling)

MPLS for Traffic Engineering

  1. 透過 OSPF 廣播找到 A->C 的最短路徑,該路徑必須 ≥ 40Mbps
  2. A 告訴 B 保留 40Mbps, B 告訴 C 保留 40Mbps
  3. Router A, B, C 對應設置 label table (signaling: RSVP-TE) 以建立從 A->C 的 tunnel
  4. 開始傳輸

Problems of MPLS TE

Inter-DC WANs suffer from two key problems today.

Poor efficiency:

The amount of traffic the WAN carries tends to be low compared to capacity. For a production inter-DC WAN, which we call IDN (§6.1), we find that the average utilization of half the links is under 30% and of three in four links is under 50%.

Why?

  1. services send whenever and however much traffic they want, without regard to the current state of the network or other services
  2. the local, greedy resource allocation model of MPLS TE is inefficient.

Drawback: Inefficiency

因為原本用 Greedy Approach, 給定同樣的 topology 可能會因為 order 不同得到 sub-optimal allocation

Drawback: Fairness

MPLS 沒辦法保證 fairness, 可能造成某些節點被 allocated 更多 capacity

為了達到 fair allocation: 需要 carefully select path

SWAN Design Goals:

Goals:

  1. 優先級
  2. 同一個優先級:Fair sharing within each class

Approach: SDN-based

  • Services 告訴 controller demands (= Google’s Bandenforcer 一樣)
  • Controller allocates paths 來滿足 (= Google’s TE server)
  • … Controller programs switches (= Google’s Gateway)

Challenge: Computing Allocations

Optimal global allocations must:

How SWAN address: fast, approximately-fair algorithm

Challenge: Congestion-Free Label Updates

當 routing changes -> update routing label-> 如果沒有 careful ordering, 造成 congestion

Approach: deriving a congestion-free ordering

Challenge: Switch Table Capacity

Switches have limited space for label tables, 可超過 table size

SWAN Components

Hosting:

  • tracks demands, and tell broker
  • rate-limiting

Broker (= Google’s Bandwidth Enforcer):

  • Aggregate service demand
  • Tell controller
  • Instructs hosts to allocated/limit rate

Network agents (= Google’s Gateway)

  • Convey topology sate to controller (= Google’s TE server)
  • Track topology and traffic with the aid of switches.

Controller:

  • Computer allocations to services
  • Changes forwarding state
  • Compute the service allocations and forwarding plane configuration for the network
  • Signal new allocations to services whose allocation has decreased. Wait for Th seconds for the service to lower its sending rate.
  • Change the forwarding state and then signal the new allocations to services whose allocation has increased.

Computing Allocations: Inputs and Outputs

Inputs:

  • demands between each DC pair
  • tunnels between each DC pair
  • link capacity of each link

Output: fraction of flow i over tunnel j

Goal

maximize network utilization while max-min fairness among same priority services

Allocation LP:

  • MCF (multi-commodity flow) function that maximizes the overall throughput while preferring shorter paths
  • sPri is the fraction of scratch link capacity that enables congestion-managed network updates
  • SWAN also ensures that higher priority traffic is likelier to use shorter paths
  • To speed up: by invoking MCF in T steps, with the constraint that at step
    k
    , flows are allocated rates in the range [αk−1U,αkU], but
    no more than their demand
  • runs the LP at the granularity of DCs instead of switches
  • SWAN aggregates the demands from all services in the same priority class between a pair of DCs. This reduces the number of flows
  • Parameters α and U trade-off unfairness for runtime

What’s max-min fairness?

  1. No user is allocated more demand
  2. Among all users whose demand is not satisfied, the resulting allocation has the highest min allocation

Say we have 10 cookes, Demands: A:2, B:5, C:6

Allocation 1: 1, 4, 5

Allocation 2: 2, 3, 5

Allocation 3: 2, 4, 4

Allocation 3 is more fair than Allocation 2 because the min-values are 3 and 4. and 3 < 4

Building Block: Multi-Commodity Flow

Maximize total allocation, subject to

  • demand constraints
  • capacity constraints

Two wrinkles

  • add constraint to limit allocation within bounds
  • leave scratch capacity for congestion-free updates

Maximizing Throughput

For each priority class

  • maximize throughput using MCF

By doing this in priority order

  • high priority traffc is allocated shorter paths

Ensuring Table Size Constraint

Problem:Routers limited routing table sizes

Solution: Run Optimization to find tunnels set first. Rule out tunnels carrying less traffic and being to long. Run optimization again

B4 v.s. SWAN

As in SWAN, B4 uses SDNs in the context of inter-DC WANs.

Although this parallel work shares a similar high-level architecture, it addresses different challenges.

While B4 develops custom switches and mechanisms to integrate existing routing protocols (BGP+ISIS)in an SDN environment, SWAN develops mechanisms for congestion-free data plane updates and for effectively using the limited forwarding table capacity of commodity switches.

--

--

ztex, Tony, Liu

Incoming-Intern, CPU emulation software @Apple, Ex-SDE @Amazon. Working on embedded system, Free-RTOS, RISC-V etc.