Orion: Google's {Software-Defined} Networking Control Plane

AD Ferguson, S Gribble, CY Hong, C Killian… - … USENIX Symposium on …, 2021 - usenix.org
We present Orion, a distributed Software-Defined Networking platform deployed globally in
Google's datacenter (Jupiter) and Wide Area (B4) networks. Orion was designed around a …

Programmable calendar queues for high-speed packet scheduling

NK Sharma, C Zhao, M Liu, PG Kannan, C Kim… - … USENIX Symposium on …, 2020 - usenix.org
Packet schedulers traditionally focus on the prioritized transmission of packets. Scheduling
is often realized through coarse-grained queue-level priorities, as in today's switches, or …

Aequitas: Admission control for performance-critical rpcs in datacenters

Y Zhang, G Kumar, N Dukkipati, X Wu, P Jha… - Proceedings of the …, 2022 - dl.acm.org
With the increasing popularity of disaggregated storage and microservice architectures, high
fan-out and fan-in Remote Procedure Calls (RPCs) now generate most of the traffic in …

Congestion control in machine learning clusters

S Rajasekaran, M Ghobadi, G Kumar… - Proceedings of the 21st …, 2022 - dl.acm.org
This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …

[PDF][PDF] Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling

P Sun, Z Guo, J Wang, J Li, J Lan, Y Hu - Proceedings of the Twenty-Ninth …, 2021 - ijcai.org
To improve the processing efficiency of jobs in distributed computing, the concept of coflow
is proposed. A coflow is a collection of flows that are semantically correlated in a multi-stage …

Is advance knowledge of flow sizes a plausible assumption?

V Ðukić, SA Jyothi, B Karlaš, M Owaida… - … USENIX Symposium on …, 2019 - usenix.org
Recent research has proposed several packet, flow, and coflow scheduling methods that
could substantially improve data center network performance. Most of this work assumes …

dcPIM: Near-optimal proactive datacenter transport

Q Cai, MT Arashloo, R Agarwal - Proceedings of the ACM SIGCOMM …, 2022 - dl.acm.org
Datacenter Parallel Iterative Matching (dcPIM) is a proactive data-center transport design
that simultaneously achieves near-optimal tail latency for short flows and near-optimal …

Packet order matters! improving application performance by deliberately delaying packets

H Ghasemirahni, T Barbette, GP Katsikas… - … USENIX Symposium on …, 2022 - usenix.org
Data centers increasingly deploy commodity servers with high-speed network interfaces to
enable low-latency communication. However, achieving low latency at high data rates …

On the hardness and inapproximability of virtual network embeddings

M Rost, S Schmid - IEEE/ACM transactions on networking, 2020 - ieeexplore.ieee.org
Many resource allocation problems in the cloud can be described as a basic Virtual Network
Embedding Problem (VNEP): the problem of finding a mapping of a request graph …

Leveraging service meshes as a new network layer

S Ashok, PB Godfrey, R Mittal - Proceedings of the 20th ACM Workshop …, 2021 - dl.acm.org
As modern cloud services have scaled out, applications have moved from relatively
monolithic designs to highly modularized fleets of microservices that communicate among …