Efficient large-scale language model training on gpu clusters using megatron-lm
Large language models have led to state-of-the-art accuracies across several tasks.
However, training these models efficiently is challenging because: a) GPU memory capacity …
However, training these models efficiently is challenging because: a) GPU memory capacity …
Improving robustness using generated data
Recent work argues that robust training requires substantially larger datasets than those
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …
required for standard classification. On CIFAR-10 and CIFAR-100, this translates into a …
Learning to simulate complex physics with graph networks
A Sanchez-Gonzalez, J Godwin… - International …, 2020 - proceedings.mlr.press
Here we present a machine learning framework and model implementation that can learn to
simulate a wide variety of challenging physical domains, involving fluids, rigid solids, and …
simulate a wide variety of challenging physical domains, involving fluids, rigid solids, and …
Retinatrack: Online single stage joint detection and tracking
Traditionally multi-object tracking and object detection are performed using separate
systems with most prior works focusing exclusively on one of these aspects over the other …
systems with most prior works focusing exclusively on one of these aspects over the other …
Networking systems of AI: On the convergence of computing and communications
Artificial intelligence (AI) and 5G system have been two hot technical areas that are
changing the world. On the deep convergence of computing and communication, networking …
changing the world. On the deep convergence of computing and communication, networking …
Context r-cnn: Long term temporal context for per-camera object detection
In static monitoring cameras, useful contextual information can stretch far beyond the few
seconds typical video understanding models might see: subjects may exhibit similar …
seconds typical video understanding models might see: subjects may exhibit similar …
Improving 3d object detection through progressive population based augmentation
Data augmentation has been widely adopted for object detection in 3D point clouds.
However, all previous related efforts have focused on manually designing specific data …
However, all previous related efforts have focused on manually designing specific data …
Lightweight Deep Learning for Resource-Constrained Environments: A Survey
Over the past decade, the dominance of deep learning has prevailed across various
domains of artificial intelligence, including natural language processing, computer vision …
domains of artificial intelligence, including natural language processing, computer vision …
A large batch optimizer reality check: Traditional, generic optimizers suffice across batch sizes
Recently the LARS and LAMB optimizers have been proposed for training neural networks
faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update …
faster using large batch sizes. LARS and LAMB add layer-wise normalization to the update …
Kaisa: an adaptive second-order optimizer framework for deep neural networks
JG Pauloski, Q Huang, L Huang… - Proceedings of the …, 2021 - dl.acm.org
Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge
faster in deep neural network (DNN) training than stochastic gradient descent (SGD); …
faster in deep neural network (DNN) training than stochastic gradient descent (SGD); …