Multi-moments in time: Learning and interpreting models for multi-action video understanding

R Yue, Z Tian, S Du - Neurocomputing, 2022 - Elsevier

Action recognition is a major branch of computer vision research. As a widely used
technology, action recognition has been applied to human–computer interaction, intelligent …

被引用次数：45 相关文章所有 2 个版本

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

被引用次数：74 相关文章所有 3 个版本

[PDF] arxiv.org

Internvideo: General video foundation models via generative and discriminative learning

Y Wang, K Li, Y Li, Y He, B Huang, Z Zhao… - arXiv preprint arXiv …, 2022 - arxiv.org

The foundation models have recently shown excellent performance on a variety of
downstream tasks in computer vision. However, most existing vision foundation models …

被引用次数：245 相关文章所有 2 个版本

[PDF] arxiv.org

Prompting visual-language models for efficient video understanding

C Ju, T Han, K Zheng, Y Zhang, W Xie - European Conference on …, 2022 - Springer

Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …

被引用次数：337 相关文章所有 6 个版本

[PDF] thecvf.com

Evidential deep learning for open set action recognition

W Bao, Q Yu, Y Kong - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

In a real-world scenario, human actions are typically out of the distribution from training data,
which requires a model to both recognize the known actions and reject the unknown …

被引用次数：147 相关文章所有 5 个版本

[PDF] arxiv.org

Human-to-robot imitation in the wild

S Bahl, A Gupta, D Pathak - arXiv preprint arXiv:2207.09450, 2022 - arxiv.org

We approach the problem of learning by watching humans in the wild. While traditional
approaches in Imitation and Reinforcement Learning are promising for learning in the real …

被引用次数：113 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive study of deep video action recognition

Y Zhu, X Li, C Liu, M Zolfaghari, Y Xiong, C Wu… - arXiv preprint arXiv …, 2020 - arxiv.org

Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …

被引用次数：211 相关文章所有 2 个版本

[PDF] thecvf.com

Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos

A Deliege, A Cioppa, S Giancola… - Proceedings of the …, 2021 - openaccess.thecvf.com

Understanding broadcast videos is a challenging task in computer vision, as it requires
generic reasoning capabilities to appreciate the content offered by the video editing. In this …

被引用次数：148 相关文章所有 14 个版本

[PDF] arxiv.org

Ar-net: Adaptive frame resolution for efficient action recognition

Y Meng, CC Lin, R Panda, P Sattigeri… - Computer Vision–ECCV …, 2020 - Springer

Action recognition is an open and challenging problem in computer vision. While current
state-of-the-art models offer excellent recognition results, their computational expense limits …

被引用次数：162 相关文章所有 8 个版本

[PDF] elifesciences.org

DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels

JP Bohnslav, NK Wimalasena, KJ Clausing, YY Dai… - Elife, 2021 - elifesciences.org

Videos of animal behavior are used to quantify researcher-defined behaviors of interest to
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …

被引用次数：146 相关文章所有 14 个版本