Vision transformers for action recognition: A survey

A Ulhaq, N Akhtar, G Pogrebna, A Mian - arXiv preprint arXiv:2209.05700, 2022 - arxiv.org
Vision transformers are emerging as a powerful tool to solve computer vision problems.
Recent techniques have also proven the efficacy of transformers beyond the image domain …

Ms-tct: Multi-scale temporal convtransformer for action detection

R Dai, S Das, K Kahatapitiya… - Proceedings of the …, 2022 - openaccess.thecvf.com
Action detection is an essential and challenging task, especially for densely labelled
datasets of untrimmed videos. The temporal relation is complex in those datasets, including …

Unimd: Towards unifying moment retrieval and temporal action detection

Y Zeng, Y Zhong, C Feng, L Ma - European Conference on Computer …, 2025 - Springer
Abstract Temporal Action Detection (TAD) focuses on detecting pre-defined actions, while
Moment Retrieval (MR) aims to identify the events described by open-ended natural …

Token turing machines

MS Ryoo, K Gopalakrishnan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Token Turing Machines (TTM), a sequential, autoregressive
Transformer model with memory for real-world sequential visual understanding. Our model …

Does self-supervised learning really improve reinforcement learning from pixels?

X Li, J Shang, S Das, M Ryoo - Advances in Neural …, 2022 - proceedings.neurips.cc
We investigate whether self-supervised learning (SSL) can improve online reinforcement
learning (RL) from pixels. We extend the contrastive reinforcement learning framework (eg …

Two-dimensional and three-dimensional CNN-based simultaneous detection and activity classification of construction workers

G Torabi, A Hammad, N Bouguila - Journal of Computing in Civil …, 2022 - ascelibrary.org
The type and duration of construction workers' activities are useful information for project
management purposes. Therefore, several studies have used surveillance cameras and …

Adafocus: Towards end-to-end weakly supervised learning for long-video action understanding

J Zhou, H Li, KY Lin, J Liang - arXiv preprint arXiv:2311.17118, 2023 - arxiv.org
Developing end-to-end models for long-video action understanding tasks presents
significant computational and memory challenges. Existing works generally build models on …

Aan: Attributes-aware network for temporal action detection

R Dai, S Das, MS Ryoo, F Bremond - arXiv preprint arXiv:2309.00696, 2023 - arxiv.org
The challenge of long-term video understanding remains constrained by the efficient
extraction of object semantics and the modelling of their relationships for downstream tasks …

Test-Time Mixup Augmentation for Data and Class-Dependent Uncertainty Estimation in Deep Learning Image Classification

H Lee, H Lee, H Hong, J Kim - arXiv preprint arXiv:2212.00214, 2022 - arxiv.org
Uncertainty estimation of the trained deep learning networks is valuable for optimizing
learning efficiency and evaluating the reliability of network predictions. In this paper, we …

Productivity Monitoring of Construction Workers Based on Spatiotemporal Activity Recognition

G Torabi - 2022 - spectrum.library.concordia.ca
Workers' productivity monitoring is an essential but time-consuming part of large
construction projects. Therefore, automating this process using surveillance cameras has …