Sequential modeling enables scalable learning for large vision models
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …
Model (LVM) without making use of any linguistic data. To do this we define a common …
Review of dynamic gesture recognition
SHI Yuanyuan, LI Yunan, FU Xiaolong, M Kaibin… - Virtual Reality & …, 2021 - Elsevier
In recent years, gesture recognition has been widely used in the fields of intelligent driving,
virtual reality, and human-computer interaction. With the development of artificial …
virtual reality, and human-computer interaction. With the development of artificial …
Recurring the transformer for video action recognition
Existing video understanding approaches, such as 3D convolutional neural networks and
Transformer-Based methods, usually process the videos in a clip-wise manner. Hence huge …
Transformer-Based methods, usually process the videos in a clip-wise manner. Hence huge …
Tsm: Temporal shift module for efficient video understanding
The explosive growth in video streaming gives rise to challenges on performing video
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …
Action-net: Multipath excitation for action recognition
Abstract Spatial-temporal, channel-wise, and motion patterns are three complementary and
crucial types of information for video action recognition. Conventional 2D CNNs are …
crucial types of information for video action recognition. Conventional 2D CNNs are …
BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues
Recent progress in fine-grained gesture and action classification, and machine translation,
point to the possibility of automated sign language recognition becoming a reality. A key …
point to the possibility of automated sign language recognition becoming a reality. A key …
Direcformer: A directed attention in transformer approach to robust action recognition
Human action recognition has recently become one ofthe popular research topics in the
computer vision community. Various 3D-CNN based methods have been presented to tackle …
computer vision community. Various 3D-CNN based methods have been presented to tackle …
Something-else: Compositional action recognition with spatial-temporal interaction networks
Human action is naturally compositional: humans can easily recognize and perform actions
with objects that are different from those used in training demonstrations. In this paper, we …
with objects that are different from those used in training demonstrations. In this paper, we …
Attention clusters: Purely attention based local feature integration for video classification
Recently, substantial research effort has focused on how to apply CNNs or RNNs to better
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …
capture temporal patterns in videos, so as to improve the accuracy of video classification. In …