Self-supervised video transformer
In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …
video data. From a given video, we create local and global spatiotemporal views with …
DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels
Videos of animal behavior are used to quantify researcher-defined behaviors of interest to
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …
Ms-tct: Multi-scale temporal convtransformer for action detection
Action detection is an essential and challenging task, especially for densely labelled
datasets of untrimmed videos. The temporal relation is complex in those datasets, including …
datasets of untrimmed videos. The temporal relation is complex in those datasets, including …
Learning multi-granular spatio-temporal graph network for skeleton-based action recognition
The task of skeleton-based action recognition remains a core challenge in human-centred
scene understanding due to the multiple granularities and large variation in human motion …
scene understanding due to the multiple granularities and large variation in human motion …
Unimd: Towards unifying moment retrieval and temporal action detection
Abstract Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while
Moment Retrieval (MR) aims to identify the events described by open-ended natural …
Moment Retrieval (MR) aims to identify the events described by open-ended natural …
Spartan: Self-supervised spatiotemporal transformers approach to group activity recognition
In this paper, we propose a new, simple, and effective Self-supervised Spatio-temporal
Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled …
Transformers (SPARTAN) approach to Group Activity Recognition (GAR) using unlabeled …
Token turing machines
MS Ryoo, K Gopalakrishnan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Token Turing Machines (TTM), a sequential, autoregressive
Transformer model with memory for real-world sequential visual understanding. Our model …
Transformer model with memory for real-world sequential visual understanding. Our model …
Pointtad: Multi-label temporal action detection with learnable query points
J Tan, X Zhao, X Shi, B Kang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Traditional temporal action detection (TAD) usually handles untrimmed videos with small
number of action instances from a single label (eg, ActivityNet, THUMOS). However, this …
number of action instances from a single label (eg, ActivityNet, THUMOS). However, this …
Pat: Position-aware transformer for dense multi-label action detection
We present PAT, a transformer-based network that learns complex temporal co-occurrence
action dependencies in a video by exploiting multi-scale temporal features. In existing …
action dependencies in a video by exploiting multi-scale temporal features. In existing …
Action sensitivity learning for temporal action localization
Temporal action localization (TAL), which involves recognizing and locating action
instances, is a challenging task in video understanding. Most existing approaches directly …
instances, is a challenging task in video understanding. Most existing approaches directly …