Test of time: Instilling video-language models with a sense of time
Modelling and understanding time remains a challenge in contemporary video
understanding models. With language emerging as a key driver towards powerful …
understanding models. With language emerging as a key driver towards powerful …
Video contrastive learning with global context
Contrastive learning has revolutionized the self-supervised image representation learning
field and recently been adapted to the video domain. One of the greatest advantages of …
field and recently been adapted to the video domain. One of the greatest advantages of …
L-dawa: Layer-wise divergence aware weight aggregation in federated self-supervised visual representation learning
YAU Rehman, Y Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com
The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data
being produced at the edge. The integration of self-supervised learning (SSL) and federated …
being produced at the edge. The integration of self-supervised learning (SSL) and federated …
Masked motion encoding for self-supervised video representation learning
How to learn discriminative video representation from unlabeled videos is challenging but
crucial for video analysis. The latest attempts seek to learn a representation model by …
crucial for video analysis. The latest attempts seek to learn a representation model by …
A systematic literature review of visual feature learning: deep learning techniques, applications, challenges and future directions
Abstract Visual Feature Learning (VFL) is a critical area of research in computer vision that
involves the automatic extraction of features and patterns from images and videos. The …
involves the automatic extraction of features and patterns from images and videos. The …
Transrank: Self-supervised video representation learning via ranking-based transformation recognition
Recognizing transformation types applied to a video clip (RecogTrans) is a long-established
paradigm for self-supervised video representation learning, which achieves much inferior …
paradigm for self-supervised video representation learning, which achieves much inferior …
Temporal action localization in the deep learning era: A survey
B Wang, Y Zhao, L Yang, T Long… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The temporal action localization research aims to discover action instances from untrimmed
videos, representing a fundamental step in the field of intelligent video understanding. With …
videos, representing a fundamental step in the field of intelligent video understanding. With …
Self-supervised video representation learning using improved instance-wise contrastive learning and deep clustering
Instance-wise contrastive learning (Instance-CL), which learns to map similar instances
closer and different instances farther apart in the embedding space, has achieved …
closer and different instances farther apart in the embedding space, has achieved …
Self-supervised spatiotemporal representation learning by exploiting video continuity
Recent self-supervised video representation learning methods have found significant
success by exploring essential properties of videos, eg speed, temporal order, etc. This work …
success by exploring essential properties of videos, eg speed, temporal order, etc. This work …
Pose-based contrastive learning for domain agnostic activity representations
While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned …
benchmarks are gradually saturating, recent studies raise alarm about the learned …