Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
Deep Neural Networks (DNNs) are becoming an important tool in modern computing
applications. Accelerating their training is a major challenge and techniques range from …
applications. Accelerating their training is a major challenge and techniques range from …
Neuroevolution in deep neural networks: Current trends and future challenges
A variety of methods have been applied to the architectural configuration and learning or
training of artificial deep neural networks (DNN). These methods play a crucial role in the …
training of artificial deep neural networks (DNN). These methods play a crucial role in the …
Training compute-optimal large language models
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …
language model under a given compute budget. We find that current large language models …
An empirical analysis of compute-optimal large language model training
We investigate the optimal model size and number of tokens for training a transformer
language model under a given compute budget. We find that current large language models …
language model under a given compute budget. We find that current large language models …
Federated learning with buffered asynchronous aggregation
Scalability and privacy are two critical concerns for cross-device federated learning (FL)
systems. In this work, we identify that synchronous FL–cannot scale efficiently beyond a few …
systems. In this work, we identify that synchronous FL–cannot scale efficiently beyond a few …
Scaling laws for neural language models
We study empirical scaling laws for language model performance on the cross-entropy loss.
The loss scales as a power-law with model size, dataset size, and the amount of compute …
The loss scales as a power-law with model size, dataset size, and the amount of compute …
{Zero-offload}: Democratizing {billion-scale} model training
J Ren, S Rajbhandari, RY Aminabadi… - 2021 USENIX Annual …, 2021 - usenix.org
Large-scale model training has been a playing ground for a limited few requiring complex
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …
model refactoring and access to prohibitively expensive GPU clusters. ZeRO-Offload …
Energy efficient federated learning over wireless communication networks
In this paper, the problem of energy efficient transmission and computation resource
allocation for federated learning (FL) over wireless communication networks is investigated …
allocation for federated learning (FL) over wireless communication networks is investigated …
Measuring the effects of non-identical data distribution for federated visual classification
Federated Learning enables visual models to be trained in a privacy-preserving way using
real-world data from mobile devices. Given their distributed nature, the statistics of the data …
real-world data from mobile devices. Given their distributed nature, the statistics of the data …