Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arXiv preprint arXiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Edge learning: The enabling technology for distributed big data analytics in the edge

J Zhang, Z Qu, C Chen, H Wang, Y Zhan, B Ye… - ACM Computing …, 2021 - dl.acm.org
Machine Learning (ML) has demonstrated great promise in various fields, eg, self-driving,
smart city, which are fundamentally altering the way individuals and organizations live, work …

Towards scalable distributed training of deep learning on public cloud clusters

S Shi, X Zhou, S Song, X Wang, Z Zhu… - Proceedings of …, 2021 - proceedings.mlsys.org
Distributed training techniques have been widely deployed in large-scale deep models
training on dense-GPU clusters. However, on public cloud clusters, due to the moderate …

Geryon: Accelerating distributed CNN training by network-level flow scheduling

S Wang, D Li, J Geng - IEEE INFOCOM 2020-IEEE Conference …, 2020 - ieeexplore.ieee.org
Increasingly rich data sets and complicated models make distributed machine learning more
and more important. However, the cost of extensive and frequent parameter …

Communication optimization strategies for distributed deep neural network training: A survey

S Ouyang, D Dong, Y Xu, L Xiao - Journal of Parallel and Distributed …, 2021 - Elsevier
Recent trends in high-performance computing and deep learning have led to the
proliferation of studies on large-scale deep neural network training. However, the frequent …

Enabling all in-edge deep learning: A literature review

P Joshi, M Hasanuzzaman, C Thapa, H Afli… - IEEE Access, 2023 - ieeexplore.ieee.org
In recent years, deep learning (DL) models have demonstrated remarkable achievements
on non-trivial tasks such as speech recognition, image processing, and natural language …

Distributed learning systems with first-order methods

J Liu, C Zhang - Foundations and Trends® in Databases, 2020 - nowpublishers.com
Scalable and efficient distributed learning is one of the main driving forces behind the recent
rapid advancement of machine learning and artificial intelligence. One prominent feature of …

FedBC: blockchain-based decentralized federated learning

X Wu, Z Wang, J Zhao, Y Zhang… - 2020 IEEE international …, 2020 - ieeexplore.ieee.org
Federated learning enables participants to collaborate on model training without directly
exchanging raw data. Existing federated learning methods often follow the parameter server …

Robust searching-based gradient collaborative management in intelligent transportation system

H Shi, H Wang, R Ma, Y Hua, T Song, H Gao… - ACM Transactions on …, 2023 - dl.acm.org
With the rapid development of big data and the Internet of Things (IoT), traffic data from an
Intelligent Transportation System (ITS) is becoming more and more accessible. To …

Peta-scale embedded photonics architecture for distributed deep learning applications

Z Wu, LY Dai, A Novick, M Glick, Z Zhu… - Journal of Lightwave …, 2023 - ieeexplore.ieee.org
As Deep Learning (DL) models grow larger and more complex, training jobs are
increasingly distributed across multiple Computing Units (CU) such as GPUs and TPUs …