Test of time: Instilling video-language models with a sense of time

P Bagad, M Tapaswi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Modelling and understanding time remains a challenge in contemporary video
understanding models. With language emerging as a key driver towards powerful …

Video contrastive learning with global context

H Kuang, Y Zhu, Z Zhang, X Li… - Proceedings of the …, 2021 - openaccess.thecvf.com
Contrastive learning has revolutionized the self-supervised image representation learning
field and recently been adapted to the video domain. One of the greatest advantages of …

L-dawa: Layer-wise divergence aware weight aggregation in federated self-supervised visual representation learning

YAU Rehman, Y Gao… - Proceedings of the …, 2023 - openaccess.thecvf.com
The ubiquity of camera-enabled devices has led to large amounts of unlabeled image data
being produced at the edge. The integration of self-supervised learning (SSL) and federated …

Masked motion encoding for self-supervised video representation learning

X Sun, P Chen, L Chen, C Li, TH Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
How to learn discriminative video representation from unlabeled videos is challenging but
crucial for video analysis. The latest attempts seek to learn a representation model by …

A systematic literature review of visual feature learning: deep learning techniques, applications, challenges and future directions

M Abdullahi, ON Oyelade, AFD Kana… - Multimedia Tools and …, 2024 - Springer
Abstract Visual Feature Learning (VFL) is a critical area of research in computer vision that
involves the automatic extraction of features and patterns from images and videos. The …

Transrank: Self-supervised video representation learning via ranking-based transformation recognition

H Duan, N Zhao, K Chen, D Lin - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Recognizing transformation types applied to a video clip (RecogTrans) is a long-established
paradigm for self-supervised video representation learning, which achieves much inferior …

Temporal action localization in the deep learning era: A survey

B Wang, Y Zhao, L Yang, T Long… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The temporal action localization research aims to discover action instances from untrimmed
videos, representing a fundamental step in the field of intelligent video understanding. With …

Self-supervised video representation learning using improved instance-wise contrastive learning and deep clustering

Y Zhu, H Shuai, G Liu, Q Liu - IEEE Transactions on Circuits …, 2022 - ieeexplore.ieee.org
Instance-wise contrastive learning (Instance-CL), which learns to map similar instances
closer and different instances farther apart in the embedding space, has achieved …

Self-supervised spatiotemporal representation learning by exploiting video continuity

H Liang, N Quader, Z Chi, L Chen, P Dai, J Lu… - Proceedings of the …, 2022 - ojs.aaai.org
Recent self-supervised video representation learning methods have found significant
success by exploring essential properties of videos, eg speed, temporal order, etc. This work …

Pose-based contrastive learning for domain agnostic activity representations

D Schneider, S Sarfraz, A Roitberg… - Proceedings of the …, 2022 - openaccess.thecvf.com
While recognition accuracies of video classification models trained on conventional
benchmarks are gradually saturating, recent studies raise alarm about the learned …