Efficient large-scale language model training on gpu clusters using megatron-lm

D Narayanan, M Shoeybi, J Casper… - Proceedings of the …, 2021 - dl.acm.org
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …

Improving robustness using generated data

S Gowal, SA Rebuffi, O Wiles… - Advances in …, 2021 - proceedings.neurips.cc
Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …

Learning to simulate complex physics with graph networks

A Sanchez-Gonzalez, J Godwin… - International …, 2020 - proceedings.mlr.press
Here we present a machine learning framework and model implementation that can learn to
simulate a wide variety of challenging physical domains, involving fluids, rigid solids, and …

Retinatrack: Online single stage joint detection and tracking

Z Lu, V Rathod, R Votel… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Traditionally multi-object tracking and object detection are performed using separate
systems with most prior works focusing exclusively on one of these aspects over the other …

Networking systems of AI: On the convergence of computing and communications

L Song, X Hu, G Zhang, P Spachos… - IEEE Internet of …, 2022 - ieeexplore.ieee.org
Artificial intelligence (AI) and 5G system have been two hot technical areas that are
changing the world. On the deep convergence of computing and communication, networking …

Context r-cnn: Long term temporal context for per-camera object detection

S Beery, G Wu, V Rathod, R Votel… - Proceedings of the …, 2020 - openaccess.thecvf.com
In static monitoring cameras, useful contextual information can stretch far beyond the few
seconds typical video understanding models might see: subjects may exhibit similar …

Improving 3d object detection through progressive population based augmentation

S Cheng, Z Leng, ED Cubuk, B Zoph, C Bai… - Computer Vision–ECCV …, 2020 - Springer
Data augmentation has been widely adopted for object detection in 3D point clouds.
However, all previous related efforts have focused on manually designing specific data …

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

HI Liu, M Galindo, H Xie, LK Wong, HH Shuai… - ACM Computing …, 2024 - dl.acm.org
Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …

A large batch optimizer reality check: Traditional, generic optimizers suffice across batch sizes

Z Nado, JM Gilmer, CJ Shallue, R Anil… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently the LARS and LAMB optimizers have been proposed for training neural networks
faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update …

Kaisa: an adaptive second-order optimizer framework for deep neural networks

JG Pauloski, Q Huang, L Huang… - Proceedings of the …, 2021 - dl.acm.org
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge
faster in deep neural network (DNN) training than stochastic gradient descent (SGD); …