QUIC-FL: Quick Unbiased Compression for Federated Learning
Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …
parameter server that estimates their average, is a fundamental building block in …
Accelerating Distributed Deep Learning using Lossless Homomorphic Compression
As deep neural networks (DNNs) grow in complexity and size, the resultant increase in
communication overhead during distributed training has become a significant bottleneck …
communication overhead during distributed training has become a significant bottleneck …
Optimal and Near-Optimal Adaptive Vector Quantization
Quantization is a fundamental optimization for many machine-learning use cases, including
compressing gradients, model weights and activations, and datasets. The most accurate …
compressing gradients, model weights and activations, and datasets. The most accurate …
Beyond Throughput and Compression Ratios: Towards High End-to-end Utility of Gradient Compression
Gradient aggregation has long been identified as a major bottleneck in today's large-scale
distributed machine learning training systems. One promising solution to mitigate such …
distributed machine learning training systems. One promising solution to mitigate such …
Zero-Delay QKV Compression for Mitigating KV Cache and Network Bottlenecks in LLM Inference
Z Zhang, H Shen - arXiv preprint arXiv:2408.04107, 2024 - arxiv.org
In large-language models, memory constraints in the key-value cache (KVC) pose a
challenge during inference, especially with long prompts. In this work, we observed that …
challenge during inference, especially with long prompts. In this work, we observed that …
Accelerating Federated Learning with Quick Distributed Mean Estimation
Distributed Mean Estimation (DME), in which $ n $ clients communicate vectors to a
parameter server that estimates their average, is a fundamental building block in …
parameter server that estimates their average, is a fundamental building block in …
Telemetry for Next-Generation Networks
J Langlet - 2024 - qmro.qmul.ac.uk
Software-defined networking enables tight integration between packet-processing hardware
and centralized controllers, highlighting the importance of deep network insight for informed …
and centralized controllers, highlighting the importance of deep network insight for informed …
Approximate Computing and In-Memory Computing: The Best of the Two Worlds!
MEF Essa - 2024 - search.proquest.com
Abstract Machine learning (ML) has become ubiquitous, integrating into numerous real-life
applications. However, meeting the computational demands of ML systems is challenging …
applications. However, meeting the computational demands of ML systems is challenging …