A survey on deep learning for human activity recognition

F Gu, MH Chung, M Chignell, S Valaee… - ACM Computing …, 2021 - dl.acm.org
Human activity recognition is a key to a lot of applications such as healthcare and smart
home. In this study, we provide a comprehensive survey on recent advances and challenges …

Next-generation deep learning based on simulators and synthetic data

CM de Melo, A Torralba, L Guibas, J DiCarlo… - Trends in cognitive …, 2022 - cell.com
Deep learning (DL) is being successfully applied across multiple domains, yet these models
learn in a most artificial way: they require large quantities of labeled data to grasp even …

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

Anticipative video transformer

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

Affordance diffusion: Synthesizing hand-object interactions

Y Ye, X Li, A Gupta, S De Mello… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent successes in image synthesis are powered by large-scale diffusion models.
However, most methods are currently limited to either text-or image-conditioned generation …

Towards long-form video understanding

CY Wu, P Krahenbuhl - … of the IEEE/CVF Conference on …, 2021 - openaccess.thecvf.com
Our world offers a never-ending stream of visual stimuli, yet today's vision systems only
accurately recognize patterns within a few seconds. These systems understand the present …

Joint hand motion and interaction hotspots prediction from egocentric videos

S Liu, S Tripathi, S Majumdar… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
We propose to forecast future hand-object interactions given an egocentric video. Instead of
predicting action labels or pixels, we directly predict the hand motion trajectory and the …

Video-mined task graphs for keystep recognition in instructional videos

K Ashutosh, SK Ramakrishnan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …

The epic-kitchens dataset: Collection, challenges and baselines

D Damen, H Doughty, GM Farinella… - … on Pattern Analysis …, 2020 - ieeexplore.ieee.org
Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest
egocentric video benchmark, offering a unique viewpoint on people's interaction with …

Real-time online video detection with temporal smoothing transformers

Y Zhao, P Krähenbühl - European Conference on Computer Vision, 2022 - Springer
Streaming video recognition reasons about objects and their actions in every frame of a
video. A good streaming recognition model captures both long-term dynamics and short …