A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

A survey on contrastive self-supervised learning

A Jaiswal, AR Babu, MZ Zadeh, D Banerjee… - Technologies, 2020 - mdpi.com
Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Spatiotemporal contrastive video representation learning

R Qian, T Meng, B Gong, MH Yang… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to
learn spatiotemporal visual representations from unlabeled videos. Our representations are …

Trustworthy AI: From principles to practices

B Li, P Qi, B Liu, S Di, J Liu, J Pei, J Yi… - ACM Computing Surveys, 2023 - dl.acm.org
The rapid development of Artificial Intelligence (AI) technology has enabled the deployment
of various systems based on it. However, many current AI systems are found vulnerable to …

Tcgl: Temporal contrastive graph for self-supervised video representation learning

Y Liu, K Wang, L Liu, H Lan, L Lin - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video self-supervised learning is a challenging task, which requires significant expressive
power from the model to leverage rich spatial-temporal knowledge and generate effective …

Tclr: Temporal contrastive learning for video representation

I Dave, R Gupta, MN Rizve, M Shah - Computer Vision and Image …, 2022 - Elsevier
Contrastive learning has nearly closed the gap between supervised and self-supervised
learning of image representations, and has also been explored for videos. However, prior …

Stand-alone inter-frame attention in video models

F Long, Z Qiu, Y Pan, T Yao, J Luo… - Proceedings of the …, 2022 - openaccess.thecvf.com
Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …

Contrast and order representations for video self-supervised learning

K Hu, J Shao, Y Liu, B Raj… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper studies the problem of learning self-supervised representations on videos. In
contrast to image modality that only requires appearance information on objects or scenes …

Rethinking self-supervised correspondence learning: A video frame-level similarity perspective

J Xu, X Wang - Proceedings of the IEEE/CVF International …, 2021 - openaccess.thecvf.com
Learning a good representation for space-time correspondence is the key for various
computer vision tasks, including tracking object bounding boxes and performing video …