On efficient training of large-scale deep learning models: A literature review
The field of deep learning has witnessed significant progress, particularly in computer vision
(CV), natural language processing (NLP), and speech. The use of large-scale models …
(CV), natural language processing (NLP), and speech. The use of large-scale models …
Model stock: All we need is just a few fine-tuned models
This paper introduces an efficient fine-tuning method for large pre-trained models, offering
strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from …
strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from …
Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
Deep model fusion: A survey
Deep model fusion/merging is an emerging technique that merges the parameters or
predictions of multiple deep learning models into a single one. It combines the abilities of …
predictions of multiple deep learning models into a single one. It combines the abilities of …
Badmerging: Backdoor attacks against model merging
Fine-tuning pre-trained models for downstream tasks has led to a proliferation of open-
sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective …
sourced task-specific models. Recently, Model Merging (MM) has emerged as an effective …
On Efficient Training of Large-Scale Deep Learning Models
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
Learning scalable model soup on a single gpu: An efficient subspace training strategy
Pre-training followed by fine-tuning is widely adopted among practitioners. The performance
can be improved by “model soups”[46] via exploring various hyperparameter configurations …
can be improved by “model soups”[46] via exploring various hyperparameter configurations …
Better Loss Landscape Visualization for Deep Neural Networks with Trajectory Information
The loss landscape of neural networks is a valuable perspective for studying the trainability,
generalization, and robustness of networks, and hence its visualization has been …
generalization, and robustness of networks, and hence its visualization has been …
Exponential moving average of weights in deep learning: Dynamics and benefits
D Morales-Brotons, T Vogels… - Transactions on Machine …, 2024 - openreview.net
Weight averaging of Stochastic Gradient Descent (SGD) iterates is a popular method for
training deep learning models. While it is often used as part of complex training pipelines to …
training deep learning models. While it is often used as part of complex training pipelines to …
Ensembling improves stability and power of feature selection for deep learning models
With the growing adoption of deep learning models in different real-world domains,
including computational biology, it is often necessary to understand which data features are …
including computational biology, it is often necessary to understand which data features are …