On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - arXiv preprint arXiv …, 2023 - arxiv.org
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …

Model stock: All we need is just a few fine-tuned models

DH Jang, S Yun, D Han - European Conference on Computer Vision, 2025 - Springer
This paper introduces an efficient fine-tuning method for large pre-trained models, offering
strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from …

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arXiv preprint arXiv …, 2024 - arxiv.org
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Deep model fusion: A survey

W Li, Y Peng, M Zhang, L Ding, H Hu… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …

Badmerging: Backdoor attacks against model merging

J Zhang, J Chi, Z Li, K Cai, Y Zhang… - Proceedings of the 2024 on …, 2024 - dl.acm.org
Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-
sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective …

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

Learning scalable model soup on a single gpu: An efficient subspace training strategy

T Li, W Jiang, F Liu, X Huang, JT Kwok - European Conference on …, 2025 - Springer
Pre-training followed by fine-tuning is widely adopted among practitioners. The performance
can be improved by “model soups”[46] via exploring various hyperparameter configurations …

Better Loss Landscape Visualization for Deep Neural Networks with Trajectory Information

R Ding, T Li, X Huang - Asian Conference on Machine …, 2024 - proceedings.mlr.press
The loss landscape of neural networks is a valuable perspective for studying the trainability,
generalization, and robustness of networks, and hence its visualization has been …

Exponential moving average of weights in deep learning: Dynamics and benefits

D Morales-Brotons, T Vogels… - Transactions on Machine …, 2024 - openreview.net
Weight averaging of Stochastic Gradient Descent (SGD) iterates is a popular method for
training deep learning models. While it is often used as part of complex training pipelines to …

Ensembling improves stability and power of feature selection for deep learning models

PK Gyawali, X Liu, J Zou, Z He - Machine Learning in …, 2022 - proceedings.mlr.press
With the growing adoption of deep learning models in different real-world domains,
including computational biology, it is often necessary to understand which data features are …