Self-supervised video transformer

K Ranasinghe, M Naseer, S Khan… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …

DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels

JP Bohnslav, NK Wimalasena, KJ Clausing, YY Dai… - Elife, 2021 - elifesciences.org
Videos of animal behavior are used to quantify researcher-defined behaviors of interest to
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …

Ms-tct: Multi-scale temporal convtransformer for action detection

R Dai, S Das, K Kahatapitiya… - Proceedings of the …, 2022 - openaccess.thecvf.com
Action detection is an essential and challenging task, especially for densely labelled
datasets of untrimmed videos. The temporal relation is complex in those datasets, including …

Learning multi-granular spatio-temporal graph network for skeleton-based action recognition

T Chen, D Zhou, J Wang, S Wang, Y Guan… - Proceedings of the 29th …, 2021 - dl.acm.org
The task of skeleton-based action recognition remains a core challenge in human-centred
scene understanding due to the multiple granularities and large variation in human motion …

Unimd: Towards unifying moment retrieval and temporal action detection

Y Zeng, Y Zhong, C Feng, L Ma - European Conference on Computer …, 2025 - Springer
Abstract Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while
Moment Retrieval (MR) aims to identify the events described by open-ended natural …

Spartan: Self-supervised spatiotemporal transformers approach to group activity recognition

NVS Chappa, P Nguyen, AH Nelson… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal
Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled …

Token turing machines

MS Ryoo, K Gopalakrishnan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Token Turing Machines (TTM), a sequential, autoregressive
Transformer model with memory for real-world sequential visual understanding. Our model …

Pointtad: Multi-label temporal action detection with learnable query points

J Tan, X Zhao, X Shi, B Kang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Traditional temporal action detection (TAD) usually handles untrimmed videos with small
number of action instances from a single label (eg, ActivityNet, THUMOS). However, this …

Pat: Position-aware transformer for dense multi-label action detection

F Sardari, A Mustafa, PJB Jackson… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present PAT, a transformer-based network that learns complex temporal co-occurrence
action dependencies in a video by exploiting multi-scale temporal features. In existing …

Action sensitivity learning for temporal action localization

J Shao, X Wang, R Quan, J Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal action localization (TAL), which involves recognizing and locating action
instances, is a challenging task in video understanding. Most existing approaches directly …