Demystifying parallel and distributed deep learning: An in-depth concurrency analysis

T Ben-Nun, T Hoefler - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …

Neuroevolution in deep neural networks: Current trends and future challenges

E Galván, P Mooney - IEEE Transactions on Artificial …, 2021 - ieeexplore.ieee.org
A variety of methods have been applied to the architectural configuration and learning or
training of artificial deep neural networks (DNN). These methods play a crucial role in the …

Training compute-optimal large language models

J Hoffmann, S Borgeaud, A Mensch… - arXiv preprint arXiv …, 2022 - arxiv.org
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …

An empirical analysis of compute-optimal large language model training

J Hoffmann, S Borgeaud, A Mensch… - Advances in …, 2022 - proceedings.neurips.cc
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …

Federated learning on non-IID data: A survey

H Zhu, J Xu, S Liu, Y Jin - Neurocomputing, 2021 - Elsevier
Federated learning is an emerging distributed machine learning framework for privacy
preservation. However, models trained in federated learning usually have worse …

Federated learning with buffered asynchronous aggregation

J Nguyen, K Malik, H Zhan… - International …, 2022 - proceedings.mlr.press
Scalability and privacy are two critical concerns for cross-device federated learning (FL)
systems. In this work, we identify that synchronous FL–cannot scale efficiently beyond a few …

Scaling laws for neural language models

J Kaplan, S McCandlish, T Henighan, TB Brown… - arXiv preprint arXiv …, 2020 - arxiv.org
We study empirical scaling laws for language model performance on the cross-entropy loss.
The loss scales as a power-law with model size, dataset size, and the amount of compute …

{Zero-offload}: Democratizing {billion-scale} model training

J Ren, S Rajbhandari, RY Aminabadi… - 2021 USENIX Annual …, 2021 - usenix.org
Large-scale model training has been a playing ground for a limited few requiring complex
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …

Energy efficient federated learning over wireless communication networks

Z Yang, M Chen, W Saad, CS Hong… - IEEE Transactions …, 2020 - ieeexplore.ieee.org
In this paper, the problem of energy efficient transmission and computation resource
allocation for federated learning (FL) over wireless communication networks is investigated …

Measuring the effects of non-identical data distribution for federated visual classification

TMH Hsu, H Qi, M Brown - arXiv preprint arXiv:1909.06335, 2019 - arxiv.org
Federated Learning enables visual models to be trained in a privacy-preserving way using
real-world data from mobile devices. Given their distributed nature, the statistics of the data …