Budgeted training for vision transformer

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org

The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

被引用次数：37 相关文章所有 2 个版本

[PDF] thecvf.com

Gsva: Generalized segmentation via multimodal large language models

Z Xia, D Han, Y Han, X Pan, S Song… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Generalized Referring Expression Segmentation (GRES) extends the scope of
classic RES to refer to multiple objects in one expression or identify the empty targets absent …

被引用次数：33 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z Xia, J Guo, D Han, Q Li, D Li, Y Yuan… - … on Computer Vision, 2025 - Springer

This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

被引用次数：7 相关文章所有 7 个版本

[PDF] thecvf.com

Efficienttrain: Exploring generalized curriculum learning for training visual backbones

Y Wang, Y Yue, R Lu, T Liu, Z Zhong… - Proceedings of the …, 2023 - openaccess.thecvf.com

The superior performance of modern deep networks usually comes with a costly training
procedure. This paper presents a new curriculum learning approach for the efficient training …

被引用次数：28 相关文章所有 6 个版本

[PDF] neurips.cc

Reusing pretrained models by multi-linear operators for efficient training

Y Pan, Y Yuan, Y Yin, Z Xu, L Shang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Training large models from scratch usually costs a substantial amount of resources. Towards
this problem, recent studies such as bert2BERT and LiGO have reused small pretrained …

被引用次数：11 相关文章所有 5 个版本

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org

The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

[PDF] thecvf.com

A General and Efficient Training for Transformer via Token Expansion

W Huang, Y Shen, J Xie, B Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

The remarkable performance of Vision Transformers (ViTs) typically requires an extremely
large training cost. Existing methods have attempted to accelerate the training of ViTs yet …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Accelerating Augmentation Invariance Pretraining

J Lin, CE Wu, Y Wei, P Morgado - arXiv preprint arXiv:2410.22364, 2024 - arxiv.org

Our work tackles the computational challenges of contrastive learning methods, particularly
for the pretraining of Vision Transformers (ViTs). Despite the effectiveness of contrastive …

被引用次数：1 相关文章所有 3 个版本

Dynamic Patch Sampling for Efficient Training and Dynamic Inference in Vision Transformers

B McDanel, CPN Huynh - 2023 International Conference on …, 2023 - ieeexplore.ieee.org

We introduce the notion of a Patch Sampling Schedule (PSS), that varies the number of
Vision Transformer (ViT) patches used per batch during training. Since all patches are not …