MIC: Masked image consistency for context-enhanced domain adaptation
In unsupervised domain adaptation (UDA), a model trained on source data (eg synthetic) is
adapted to target data (eg real-world) without access to target annotation. Most previous …
adapted to target data (eg real-world) without access to target annotation. Most previous …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Unmasked teacher: Towards training-efficient video foundation models
Abstract Video Foundation Models (VFMs) have received limited exploration due to high
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …
Context autoencoder for self-supervised representation learning
We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …
for self-supervised representation pretraining. We pretrain an encoder by making predictions …
Distilling large vision-language model with out-of-distribution generalizability
Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …
computational requirements make their deployment on resource-constrained devices and …
Semmae: Semantic-guided masking for learning masked autoencoders
Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …
masked language modeling. However, unlike words in NLP, the lack of semantic …
Hard patches mining for masked image modeling
Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …
potential for learning scalable visual representations. In typical approaches, models usually …
Mixed autoencoder for self-supervised visual representation learning
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks
via randomly masking image patches and reconstruction. However, effective data …
via randomly masking image patches and reconstruction. However, effective data …
Improving pixel-based mim by reducing wasted modeling capability
There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …
methods can be broadly categorized into two groups based on the reconstruction target …
Supervised masked knowledge distillation for few-shot transformers
Abstract Vision Transformers (ViTs) emerge to achieve impressive performance on many
data-abundant computer vision tasks by capturing long-range dependencies among local …
data-abundant computer vision tasks by capturing long-range dependencies among local …