On efficient training of large-scale deep learning models: A literature review
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …
(CV), natural language processing (NLP), and speech. The use of large-scale models …
Gsva: Generalized segmentation via multimodal large language models
Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
classic RES to refer to multiple objects in one expression or identify the empty targets absent …
Efficient diffusion transformer with step-wise dynamic attention mediators
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …
mechanisms of diffusion transformer models, particularly during the early stages of …
Efficienttrain: Exploring generalized curriculum learning for training visual backbones
The superior performance of modern deep networks usually comes with a costly training
procedure. This paper presents a new curriculum learning approach for the efficient training …
procedure. This paper presents a new curriculum learning approach for the efficient training …
Reusing pretrained models by multi-linear operators for efficient training
Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …
On Efficient Training of Large-Scale Deep Learning Models
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
A General and Efficient Training for Transformer via Token Expansion
The remarkable performance of Vision Transformers (ViTs) typically requires an extremely
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …
Accelerating Augmentation Invariance Pretraining
Our work tackles the computational challenges of contrastive learning methods, particularly
for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive …
for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive …
Dynamic Patch Sampling for Efficient Training and Dynamic Inference in Vision Transformers
We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of
Vision Transformer (ViT) patches used per batch during training. Since all patches are not …
Vision Transformer (ViT) patches used per batch during training. Since all patches are not …