Gradient compression supercharged high-performance data parallel dnn training

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

被引用次数：152 相关文章所有 3 个版本

[PDF] usenix.org

Alpa: Automating inter-and {Intra-Operator} parallelism for distributed deep learning

L Zheng, Z Li, H Zhang, Y Zhuang, Z Chen… - … USENIX Symposium on …, 2022 - usenix.org

Alpa automates model-parallel training of large deep learning (DL) models by generating
execution plans that unify data, operator, and pipeline parallelism. Existing model-parallel …

被引用次数：289 相关文章所有 17 个版本

[PDF] arxiv.org

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, G Huang, X Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) power a new generation of interactive AI applications
exemplified by ChatGPT. The interactive nature of these applications demand low job …

被引用次数：46 相关文章所有 2 个版本

[PDF] usenix.org

Ekko: A {Large-Scale} deep learning recommender system with {Low-Latency} model update

C Sima, Y Fu, MK Sit, L Guo, X Gong, F Lin… - … USENIX Symposium on …, 2022 - usenix.org

Deep Learning Recommender Systems (DLRSs) need to update models at low latency, thus
promptly serving new users and content. Existing DLRSs, however, fail to do so. They …

被引用次数：31 相关文章所有 7 个版本

[PDF] neurips.cc

Drive: One-bit distributed mean estimation

S Vargaftik, R Ben-Basat, A Portnoy… - Advances in …, 2021 - proceedings.neurips.cc

We consider the problem where $ n $ clients transmit $ d $-dimensional real-valued vectors
using $ d (1+ o (1)) $ bits each, in a manner that allows the receiver to approximately …

被引用次数：36 相关文章所有 9 个版本

[PDF] arxiv.org

Graft: Efficient inference serving for hybrid deep learning with SLO guarantees via DNN re-alignment

J Wu, L Wang, Q Jin, F Liu - IEEE Transactions on Parallel and …, 2023 - ieeexplore.ieee.org

Deep neural networks (DNNs) have been widely adopted for various mobile inference tasks,
yet their ever-increasing computational demands are hindering their deployment on …

被引用次数：10 相关文章所有 6 个版本

[PDF] mlr.press

Dragonn: Distributed randomized approximate gradients of neural networks

Z Wang, Z Xu, X Wu, A Shrivastava… - … on Machine Learning, 2022 - proceedings.mlr.press

Data-parallel distributed training (DDT) has become the de-facto standard for accelerating
the training of most deep learning tasks on massively parallel hardware. In the DDT …

被引用次数：12 相关文章所有 5 个版本

[PDF] nsf.gov

PervasiveFL: Pervasive federated learning for heterogeneous IoT systems

J Xia, T Liu, Z Ling, T Wang, X Fu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Federated learning (FL) has been recognized as a promising collaborative on-device
machine learning method in the design of Internet of Things (IoT) systems. However, most …

被引用次数：13 相关文章所有 2 个版本

[PDF] github.io

Hi-speed dnn training with espresso: Unleashing the full potential of gradient compression with near-optimal usage strategies

Z Wang, H Lin, Y Zhu, TSE Ng - Proceedings of the Eighteenth …, 2023 - dl.acm.org

Gradient compression (GC) is a promising approach to addressing the communication
bottleneck in distributed deep learning (DDL). It saves the communication time, but also …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

High dimensional statistical estimation under uniformly dithered one-bit quantization

J Chen, CL Wang, MK Ng… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In this paper, we propose a uniformly dithered 1-bit quantization scheme for high-
dimensional statistical estimation. The scheme contains truncation, dithering, and …

被引用次数：18 相关文章所有 7 个版本