Action recognition based on RGB and skeleton data sets: A survey
Action recognition is a major branch of computer vision research. As a widely used
technology, action recognition has been applied to human–computer interaction, intelligent …
technology, action recognition has been applied to human–computer interaction, intelligent …
Sequential modeling enables scalable learning for large vision models
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …
Model (LVM) without making use of any linguistic data. To do this we define a common …
Internvideo: General video foundation models via generative and discriminative learning
The foundation models have recently shown excellent performance on a variety of
downstream tasks in computer vision. However, most existing vision foundation models …
downstream tasks in computer vision. However, most existing vision foundation models …
Prompting visual-language models for efficient video understanding
Image-based visual-language (I-VL) pre-training has shown great success for learning joint
visual-textual representations from large-scale web data, revealing remarkable ability for …
visual-textual representations from large-scale web data, revealing remarkable ability for …
Evidential deep learning for open set action recognition
In a real-world scenario, human actions are typically out of the distribution from training data,
which requires a model to both recognize the known actions and reject the unknown …
which requires a model to both recognize the known actions and reject the unknown …
Human-to-robot imitation in the wild
We approach the problem of learning by watching humans in the wild. While traditional
approaches in Imitation and Reinforcement Learning are promising for learning in the real …
approaches in Imitation and Reinforcement Learning are promising for learning in the real …
A comprehensive study of deep video action recognition
Video action recognition is one of the representative tasks for video understanding. Over the
last decade, we have witnessed great advancements in video action recognition thanks to …
last decade, we have witnessed great advancements in video action recognition thanks to …
Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos
Understanding broadcast videos is a challenging task in computer vision, as it requires
generic reasoning capabilities to appreciate the content offered by the video editing. In this …
generic reasoning capabilities to appreciate the content offered by the video editing. In this …
Ar-net: Adaptive frame resolution for efficient action recognition
Action recognition is an open and challenging problem in computer vision. While current
state-of-the-art models offer excellent recognition results, their computational expense limits …
state-of-the-art models offer excellent recognition results, their computational expense limits …
DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels
Videos of animal behavior are used to quantify researcher-defined behaviors of interest to
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …
study neural function, gene mutations, and pharmacological therapies. Behaviors of interest …