Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings
In response to innovations in machine learning (ML) models, production workloads changed
radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its …
radically and rapidly. TPU v4 is the fifth Google domain specific architecture (DSA) and its …
[HTML][HTML] High-bandwidth density silicon photonic resonators for energy-efficient optical interconnects
The growth of artificial intelligence applications demands ever larger and more complex
deep learning models, dominating today's—and tomorrow's—data center and high …
deep learning models, dominating today's—and tomorrow's—data center and high …
Lightning: A reconfigurable photonic-electronic smartnic for fast and energy-efficient inference
The massive growth of machine learning-based applications and the end of Moore's law
have created a pressing need to redesign computing platforms. We propose Lightning, the …
have created a pressing need to redesign computing platforms. We propose Lightning, the …
Petabit-scale silicon photonic interconnects with integrated kerr frequency combs
Silicon photonics holds significant promise in revolutionizing optical interconnects in data
centers and high performance computers to enable scaling into the Pb/s package escape …
centers and high performance computers to enable scaling into the Pb/s package escape …
Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions
With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …
established networks dedicated to DML. Different communication architectures of DML have …
Cerberus: The power of choices in datacenter topology design-a throughput perspective
The bandwidth and latency requirements of modern datacenter applications have led
researchers to propose various topology designs using static, dynamic demand-oblivious …
researchers to propose various topology designs using static, dynamic demand-oblivious …
{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
Lightwave fabrics: at-scale optical circuit switching for datacenter and machine learning systems
We describe our experience developing what we believe to be the world's first large-scale
production deployments of lightwave fabrics used for both datacenter networking and …
production deployments of lightwave fabrics used for both datacenter networking and …
GRID: Gradient routing with in-network aggregation for distributed training
As the scale of distributed training increases, it brings huge communication overhead in
clusters. Some works try to reduce the communication cost through gradient compression or …
clusters. Some works try to reduce the communication cost through gradient compression or …
Congestion control in machine learning clusters
This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …