A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
downstream tasks with different data modalities. A PFM (eg, BERT, ChatGPT, and GPT-4) is …
Self-supervised representation learning: Introduction, advances, and challenges
Self-supervised representation learning (SSRL) methods aim to provide powerful, deep
feature learning without the requirement of large annotated data sets, thus alleviating the …
feature learning without the requirement of large annotated data sets, thus alleviating the …
Imagebind: One embedding space to bind them all
We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …
Masked siamese networks for label-efficient learning
Abstract We propose Masked Siamese Networks (MSN), a self-supervised learning
framework for learning image representations. Our approach matches the representation of …
framework for learning image representations. Our approach matches the representation of …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
St-adapter: Parameter-efficient image-to-video transfer learning
Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …
recently emerged with promising performance. Due to the ever-growing model size, the …
Stablerep: Synthetic images from text-to-image models make strong visual representation learners
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …
generated by text-to-image models. This is a natural question in the light of the excellent …
Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding
M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …
classification, segmentation and detection is often laborious owing to the irregular structure …
Fake it till you make it: Learning transferable representations from synthetic imagenet clones
Recent image generation models such as Stable Diffusion have exhibited an impressive
ability to generate fairly realistic images starting from a simple text prompt. Could such …
ability to generate fairly realistic images starting from a simple text prompt. Could such …
Filip: Fine-grained interactive language-image pre-training
Unsupervised large-scale vision-language pre-training has shown promising advances on
various downstream tasks. Existing methods often model the cross-modal interaction either …
various downstream tasks. Existing methods often model the cross-modal interaction either …