Green hierarchical vision transformer for masked image modeling

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：193 相关文章所有 6 个版本

[PDF] thecvf.com

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

被引用次数：135 相关文章所有 5 个版本

[PDF] arxiv.org

Context autoencoder for self-supervised representation learning

X Chen, M Ding, X Wang, Y Xin, S Mo, Y Wang… - International Journal of …, 2024 - Springer

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …

被引用次数：382 相关文章所有 5 个版本

[PDF] mlr.press

Hiera: A hierarchical vision transformer without the bells-and-whistles

C Ryali, YT Hu, D Bolya, C Wei, H Fan… - International …, 2023 - proceedings.mlr.press

Modern hierarchical vision transformers have added several vision-specific components in
the pursuit of supervised classification performance. While these components lead to …

被引用次数：114 相关文章所有 6 个版本

[PDF] arxiv.org

What to hide from your students: Attention-guided masked image modeling

I Kakogeorgiou, S Gidaris, B Psomas, Y Avrithis… - … on Computer Vision, 2022 - Springer

Transformers and masked language modeling are quickly being adopted and explored in
computer vision as vision transformers and masked image modeling (MIM). In this work, we …

被引用次数：152 相关文章所有 14 个版本

[PDF] thecvf.com

Masked image modeling with local multi-scale reconstruction

H Wang, Y Tang, Y Wang, J Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Masked Image Modeling (MIM) achieves outstanding success in self-supervised
representation learning. Unfortunately, MIM models typically have huge computational …

被引用次数：47 相关文章所有 7 个版本

[PDF] arxiv.org

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

被引用次数：80 相关文章所有 2 个版本

[PDF] arxiv.org

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J Xia… - arXiv preprint arXiv …, 2023 - arxiv.org

As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

被引用次数：8 相关文章所有 2 个版本

[PDF] acm.org

Torchsparse++: Efficient training and inference framework for sparse convolution on gpus

H Tang, S Yang, Z Liu, K Hong, Z Yu, X Li… - Proceedings of the 56th …, 2023 - dl.acm.org

Sparse convolution plays a pivotal role in emerging workloads, including point cloud
processing in AR/VR, autonomous driving, and graph understanding in recommendation …

被引用次数：20 相关文章所有 6 个版本

[PDF] thecvf.com

Stitchable neural networks

Z Pan, J Cai, B Zhuang - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

The public model zoo containing enormous powerful pretrained model families (eg,
ResNet/DeiT) has reached an unprecedented scope than ever, which significantly …

被引用次数：30 相关文章所有 6 个版本