Dataset distillation: A comprehensive review
Recent success of deep learning is largely attributed to the sheer amount of data used for
training deep neural networks. Despite the unprecedented success, the massive data …
training deep neural networks. Despite the unprecedented success, the massive data …
Domain generalization: A survey
Generalization to out-of-distribution (OOD) data is a capability natural to humans yet
challenging for machines to reproduce. This is because most learning algorithms strongly …
challenging for machines to reproduce. This is because most learning algorithms strongly …
Scaling vision transformers to 22 billion parameters
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …
present, the largest large language models (LLMs) contain upwards of 100B parameters …
Reproducible scaling laws for contrastive language-image learning
M Cherti, R Beaumont, R Wightman… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scaling up neural networks has led to remarkable performance across a wide range of
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …
tasks. Moreover, performance often follows reliable scaling laws as a function of training set …
Better diffusion models further improve adversarial training
It has been recognized that the data generated by the denoising diffusion probabilistic
model (DDPM) improves adversarial training. After two years of rapid development in …
model (DDPM) improves adversarial training. After two years of rapid development in …
Cross-entropy loss functions: Theoretical analysis and applications
Cross-entropy is a widely used loss function in applications. It coincides with the logistic loss
applied to the outputs of a neural network, when the softmax is used. But, what guarantees …
applied to the outputs of a neural network, when the softmax is used. But, what guarantees …
Datacomp: In search of the next generation of multimodal datasets
Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Diffusion and GPT-4, yet their design does not receive the same research attention as model …
Visual prompt tuning
The current modus operandi in adapting pre-trained models involves updating all the
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
backbone parameters, ie., full fine-tuning. This paper introduces Visual Prompt Tuning (VPT) …
Mitigating neural network overconfidence with logit normalization
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …
models in the real world. However, neural networks are known to suffer from the …