MIC: Masked image consistency for context-enhanced domain adaptation

L Hoyer, D Dai, H Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In unsupervised domain adaptation (UDA), a model trained on source data (eg synthetic) is
adapted to target data (eg real-world) without access to target annotation. Most previous …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Unmasked teacher: Towards training-efficient video foundation models

K Li, Y Wang, Y Li, Y Wang, Y He… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Video Foundation Models (VFMs) have received limited exploration due to high
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …

Context autoencoder for self-supervised representation learning

X Chen, M Ding, X Wang, Y Xin, S Mo, Y Wang… - International Journal of …, 2024 - Springer
We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …

Distilling large vision-language model with out-of-distribution generalizability

X Li, Y Fang, M Liu, Z Ling, Z Tu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …

Semmae: Semantic-guided masking for learning masked autoencoders

G Li, H Zheng, D Liu, C Wang, B Su… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

Mixed autoencoder for self-supervised visual representation learning

K Chen, Z Liu, L Hong, H Xu, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks
via randomly masking image patches and reconstruction. However, effective data …

Improving pixel-based mim by reducing wasted modeling capability

Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …

Supervised masked knowledge distillation for few-shot transformers

H Lin, G Han, J Ma, S Huang, X Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Vision Transformers (ViTs) emerge to achieve impressive performance on many
data-abundant computer vision tasks by capturing long-range dependencies among local …