A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

[HTML][HTML] A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations

Z Zhao, L Alzubaidi, J Zhang, Y Duan, Y Gu - Expert Systems with …, 2024 - Elsevier
Deep learning has emerged as a powerful tool in various domains, revolutionising machine
learning research. However, one persistent challenge is the scarcity of labelled training …

ibot: Image bert pre-training with online tokenizer

J Zhou, C Wei, H Wang, W Shen, C Xie, A Yuille… - arXiv preprint arXiv …, 2021 - arxiv.org
The success of language Transformers is primarily attributed to the pretext task of masked
language modeling (MLM), where texts are first tokenized into semantically meaningful …

Context autoencoder for self-supervised representation learning

X Chen, M Ding, X Wang, Y Xin, S Mo, Y Wang… - International Journal of …, 2024 - Springer
We present a novel masked image modeling (MIM) approach, context autoencoder (CAE),
for self-supervised representation pretraining. We pretrain an encoder by making predictions …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arXiv preprint arXiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Unified contrastive learning in image-text-label space

J Yang, C Li, P Zhang, B Xiao, C Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Visual recognition is recently learned via either supervised learning on human-annotated
image-label data or language-image contrastive learning with webly-crawled image-text …

Self-supervised learning for recommender systems: A survey

J Yu, H Yin, X Xia, T Chen, J Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In recent years, neural architecture-based recommender systems have achieved
tremendous success, but they still fall short of expectation when dealing with highly sparse …

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …