TITLE: Revisiting Network Transport in Evolving Datacenters
Datacenters are evolving quickly with more diversified applications, more hardware accelerators, and new architectures. They pose significant challenges on existing network transport layers because they generate heavier and more dynamic traffic while requiring even more extreme transport performance. The transport layer is also pressured from the other side — the physical network, which also evolves towards larger scales and higher speed. To meet the stringent requirements, today operators often spend months on diagnosing transport anomalies and tuning transport features; this labor-intensive process will be worse with future datacenter evolutions. In this talk, I will show how to fundamentally change transport designs to adapt to the rapid evolutions.
To simplify diagnosing existing transport (i.e., TCP) in large scale networks, I design DETER, which can deterministically replay problematic TCP connections using lightweight information recorded during the runtime. The key challenge is to overcome the chain reaction between TCP and the network that completely deviates the replay. DETER isolates each connection’s replay by capturing its interactions with applications and the network. DETER is very lightweight and can help diagnose many transport problems in large systems (e.g., Spark).
Going forward, transport protocols start to be offloaded to hardware for ultra-low latency and high bandwidth (e.g., RDMA). To unleash the raw hardware performance at scale, I design HPCC, a new high-speed transport that utilizes new capabilities of switches and NICs. Unlike existing protocols which use heuristics, HPCC precisely calculates the traffic rate that fully utilizes the bandwidth with near-zero queuing. It achieves this with precise modeling of hardware nature, driven by the link load measurement from programmable switches. HPCC has been deployed in Alibaba Cloud and supported by most switch and NIC vendors (e.g., Broadcom, Cisco, Mellanox).
Yuliang Li is a Ph.D. candidate in computer science at Harvard. He is advised by Professor Minlan Yu. His research focuses on improving datacenter network performance and availability by leveraging the opportunity of co-designing diverse hardware or software components. Prior to Harvard, he received his bachelor’s degree from Tsinghua University. He is a recipient of Siebel Scholar.