Split-brain autoencoders: Unsupervised learning by cross-channel prediction

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …

被引用次数：409 相关文章所有 2 个版本

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

被引用次数：42 相关文章所有 3 个版本

[PDF] thecvf.com

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …

被引用次数：1111 相关文章所有 6 个版本

[PDF] arxiv.org

Masked siamese networks for label-efficient learning

M Assran, M Caron, I Misra, P Bojanowski… - … on Computer Vision, 2022 - Springer

Abstract We propose Masked Siamese Networks (MSN), a self-supervised learning
framework for learning image representations. Our approach matches the representation of …

被引用次数：249 相关文章所有 5 个版本

[PDF] neurips.cc

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text

H Akbari, L Yuan, R Qian… - Advances in …, 2021 - proceedings.neurips.cc

We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …

被引用次数：564 相关文章所有 9 个版本

Extended vision transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework

J Yao, B Zhang, C Li, D Hong… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

The recent success of attention mechanism-driven deep models, like vision transformer (ViT)
as one of the most representatives, has intrigued a wave of advanced research to explore …

被引用次数：126 相关文章所有 3 个版本

[PDF] thecvf.com

Bevt: Bert pretraining of video transformers

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper studies the BERT pretraining of video transformers. It is a straightforward but
worth-studying extension given the recent success from BERT pretraining of image …

被引用次数：210 相关文章所有 6 个版本

[PDF] thecvf.com

Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization

L Melas-Kyriazi, C Rupprecht… - Proceedings of the …, 2022 - openaccess.thecvf.com

Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …

被引用次数：126 相关文章所有 10 个版本

GAN-based anomaly detection: A review

X Xia, X Pan, N Li, X He, L Ma, X Zhang, N Ding - Neurocomputing, 2022 - Elsevier

Supervised learning algorithms have shown limited use in the field of anomaly detection due
to the unpredictability and difficulty in acquiring abnormal samples. In recent years …

被引用次数：208 相关文章所有 2 个版本

Convolutional neural networks for multimodal remote sensing data classification

X Wu, D Hong, J Chanussot - IEEE Transactions on Geoscience …, 2021 - ieeexplore.ieee.org

In recent years, enormous research has been made to improve the classification
performance of single-modal remote sensing (RS) data. However, with the ever-growing …

被引用次数：249 相关文章所有 3 个版本