A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
A survey on contrastive self-supervised learning
Self-supervised learning has gained popularity because of its ability to avoid the cost of
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …
annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as …
Self-supervised learning for videos: A survey
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
Spatiotemporal contrastive video representation learning
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to
learn spatiotemporal visual representations from unlabeled videos. Our representations are …
learn spatiotemporal visual representations from unlabeled videos. Our representations are …
Trustworthy AI: From principles to practices
The rapid development of Artificial Intelligence (AI) technology has enabled the deployment
of various systems based on it. However, many current AI systems are found vulnerable to …
of various systems based on it. However, many current AI systems are found vulnerable to …
Tcgl: Temporal contrastive graph for self-supervised video representation learning
Video self-supervised learning is a challenging task, which requires significant expressive
power from the model to leverage rich spatial-temporal knowledge and generate effective …
power from the model to leverage rich spatial-temporal knowledge and generate effective …
Tclr: Temporal contrastive learning for video representation
Contrastive learning has nearly closed the gap between supervised and self-supervised
learning of image representations, and has also been explored for videos. However, prior …
learning of image representations, and has also been explored for videos. However, prior …
Stand-alone inter-frame attention in video models
Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …
understanding models. Modern deep learning models leverage motion by either executing …
Contrast and order representations for video self-supervised learning
This paper studies the problem of learning self-supervised representations on videos. In
contrast to image modality that only requires appearance information on objects or scenes …
contrast to image modality that only requires appearance information on objects or scenes …
Rethinking self-supervised correspondence learning: A video frame-level similarity perspective
Learning a good representation for space-time correspondence is the key for various
computer vision tasks, including tracking object bounding boxes and performing video …
computer vision tasks, including tracking object bounding boxes and performing video …