What to hide from your students: Attention-guided masked image modeling

L Hoyer, D Dai, H Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

In unsupervised domain adaptation (UDA), a model trained on source data (eg synthetic) is
adapted to target data (eg real-world) without access to target annotation. Most previous …

被引用次数：238 相关文章所有 9 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：193 相关文章所有 6 个版本

[PDF] thecvf.com

Unmasked teacher: Towards training-efficient video foundation models

K Li, Y Wang, Y Li, Y Wang, Y He… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Video Foundation Models (VFMs) have received limited exploration due to high
computational costs and data scarcity. Previous VFMs rely on Image Foundation Models …

被引用次数：139 相关文章所有 5 个版本

[PDF] arxiv.org

Context autoencoder for self-supervised representation learning

X Chen, M Ding, X Wang, Y Xin, S Mo, Y Wang… - International Journal of …, 2024 - Springer

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …

被引用次数：382 相关文章所有 5 个版本

[PDF] thecvf.com

Distilling large vision-language model with out-of-distribution generalizability

X Li, Y Fang, M Liu, Z Ling, Z Tu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Large vision-language models have achieved outstanding performance, but their size and
computational requirements make their deployment on resource-constrained devices and …

被引用次数：26 相关文章所有 5 个版本

[PDF] neurips.cc

Semmae: Semantic-guided masking for learning masked autoencoders

G Li, H Zheng, D Liu, C Wang, B Su… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, significant progress has been made in masked image modeling to catch up to
masked language modeling. However, unlike words in NLP, the lack of semantic …

被引用次数：109 相关文章所有 5 个版本

[PDF] thecvf.com

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

被引用次数：52 相关文章所有 5 个版本

[PDF] thecvf.com

Mixed autoencoder for self-supervised visual representation learning

K Chen, Z Liu, L Hong, H Xu, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks
via randomly masking image patches and reconstruction. However, effective data …

被引用次数：40 相关文章所有 7 个版本

[PDF] thecvf.com

Improving pixel-based mim by reducing wasted modeling capability

Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com

There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …

被引用次数：22 相关文章所有 6 个版本

[PDF] thecvf.com

Supervised masked knowledge distillation for few-shot transformers

H Lin, G Han, J Ma, S Huang, X Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Vision Transformers (ViTs) emerge to achieve impressive performance on many
data-abundant computer vision tasks by capturing long-range dependencies among local …

被引用次数：39 相关文章所有 6 个版本