A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …
achieve satisfactory performance. However, the process of collecting and labeling such data …
Simmim: A simple framework for masked image modeling
This paper presents SimMIM, a simple framework for masked image modeling. We have
simplified recently proposed relevant approaches, without the need for special designs …
simplified recently proposed relevant approaches, without the need for special designs …
Masked siamese networks for label-efficient learning
Abstract We propose Masked Siamese Networks (MSN), a self-supervised learning
framework for learning image representations. Our approach matches the representation of …
framework for learning image representations. Our approach matches the representation of …
Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text
We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …
Extended vision transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework
The recent success of attention mechanism-driven deep models, like vision transformer (ViT)
as one of the most representatives, has intrigued a wave of advanced research to explore …
as one of the most representatives, has intrigued a wave of advanced research to explore …
Bevt: Bert pretraining of video transformers
This paper studies the BERT pretraining of video transformers. It is a straightforward but
worth-studying extension given the recent success from BERT pretraining of image …
worth-studying extension given the recent success from BERT pretraining of image …
Deep spectral methods: A surprisingly strong baseline for unsupervised semantic segmentation and localization
L Melas-Kyriazi, C Rupprecht… - Proceedings of the …, 2022 - openaccess.thecvf.com
Unsupervised localization and segmentation are long-standing computer vision challenges
that involve decomposing an image into semantically-meaningful segments without any …
that involve decomposing an image into semantically-meaningful segments without any …
GAN-based anomaly detection: A review
X Xia, X Pan, N Li, X He, L Ma, X Zhang, N Ding - Neurocomputing, 2022 - Elsevier
Supervised learning algorithms have shown limited use in the field of anomaly detection due
to the unpredictability and difficulty in acquiring abnormal samples. In recent years …
to the unpredictability and difficulty in acquiring abnormal samples. In recent years …
Convolutional neural networks for multimodal remote sensing data classification
In recent years, enormous research has been made to improve the classification
performance of single-modal remote sensing (RS) data. However, with the ever-growing …
performance of single-modal remote sensing (RS) data. However, with the ever-growing …