Georgia Tech researchers have identified a new congestion problem and have created a new congestion control scheme to alleviate the slow down. Annulus decreases datacenter bottleneck by up to 3.5 times and improves datacenter traffic by 56x when the connect is from the wide area network (WAN).
As the cloud continues to bring easy data storage to billions, operators are pushing their networks to peak efficiency to reduce costs. Datacenters are designed with the expectation that there will be more internal data than external, but as more people join the cloud, this paradigm is no longer true, according to the researchers.
As soon as the cloud became a prominent data storage method, cloud operators started leasing compute, storage, and network resources to their clients. The rise in demand for these resources means that networks are now more overloaded than ever.
Cloud providers’ solution to the influx of computers on the network was to allocate more resources than necessary. Yet this isn’t cost effective, according to School of Computer Science (SCS) alumnus Ahmed Saeed.
The cloud is made of interconnected datacenters. One machine in a datacenter can send traffic to other machines in the same datacenter and others outside of it. This is further complicated by data transmission speed, or latency, which is very fast inside the datacenter. However, the latency between datacenters is much longer because data must travel across the country through the WAN.
When both types of traffic compete, the faster reacting type of traffic shoulders the burden of reacting to any changes in network conditions, harming overall performance. This problem can be ignored when the demand of the slow traffic is low, but as the demand increases, the problem becomes more prominent.
“When the bandwidth demand of datacenter-to-datacenter traffic is small, it is not very costly to dedicate that bandwidth within the datacenter network,” Saeed says. “However, as that demand increases, dedicating resources can be wasteful.”
If the network is bottlenecked, the datacenter traffic will immediately sense the problem and compensate for it. However, WAN traffic will take much longer. In effect, the entire network becomes 2.5 times less inefficient before all parts of the network are aware of it.
“Information has to travel from one data center through the WAN and back,” said Saeed. “It’s problematic.”
When demand for network bandwidth is larger than available capacity, the part of the network responsible for handling traffic is called congestion control. This is a fundamental concept in network research and how the internet functions. Yet this efficiency issue between datacenter traffic and WAN traffic has led to a new type of congestion, and the solution has to be time focused, according to Saeed.
“It’s not a matter of how good your network signal is or how smart your algorithm is, but how fast you react,” he said.
With this in mind, the researchers worked to the solve the problem in the datacenter, not through the end points. Annulus works by sending a message from the congested part of the network to the traffic source to let it know how to react, essentially cutting out the middleman.
Annulus has two control loops. One uses existing congestion control algorithms for bottlenecks at just one source, either the WAN or the datacenter. The other loop focuses on bottlenecks between the WAN and the datacenter and tries to solve the problem at the traffic source.
When tested, Annulus improves datacenter traffic by 56x when the bottleneck is the WAN. It also reduces datacenter flow issues by up to 3.5x.
Saeed presented the research in the paper, Annulus: A Dual Congestion Control Loop for Datacenter and WAN Traffic Aggregates, at the ACM Special Interest Group on Data Communication (SIGCOMM) from Aug. 10 to 14. He co-wrote the paper with SCS’s Professors Mostafa Ammar and Ellen Zegura, and Varun Gupta, Prateesh Goyal, Milad Sharif, Rong Pan, Keon Jang, Mohammad Alizadeh, Abdul Kabbani, and Amin Vahdat.