A review of vision-based traffic semantic understanding in ITSs

J Chen, Q Wang, HH Cheng, W Peng… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
A semantic understanding of road traffic can help people understand road traffic flow
situations and emergencies more accurately and provide a more accurate basis for anomaly …

Anticipative video transformer

R Girdhar, K Grauman - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Abstract We propose Anticipative Video Transformer (AVT), an end-to-end attention-based
video modeling architecture that attends to the previously observed video in order to …

Region attention networks for pose and occlusion robust facial expression recognition

K Wang, X Peng, J Yang, D Meng… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Occlusion and pose variations, which can change facial appearance significantly, are two
major obstacles for automatic Facial Expression Recognition (FER). Though automatic FER …

Tsm: Temporal shift module for efficient video understanding

J Lin, C Gan, S Han - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com
The explosive growth in video streaming gives rise to challenges on performing video
understanding at high accuracy and low computation cost. Conventional 2D CNNs are …

Group-aware label transfer for domain adaptive person re-identification

K Zheng, W Liu, L He, T Mei, J Luo… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Unsupervised Domain Adaptive (UDA) person re-identification (ReID) aims at
adapting the model trained on a labeled source-domain dataset to a target-domain dataset …

Video action transformer network

R Girdhar, J Carreira, C Doersch… - Proceedings of the …, 2019 - openaccess.thecvf.com
Abstract We introduce the Action Transformer model for recognizing and localizing human
actions in video clips. We repurpose a Transformer-style architecture to aggregate features …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Epic-fusion: Audio-visual temporal binding for egocentric action recognition

E Kazakos, A Nagrani, A Zisserman… - Proceedings of the …, 2019 - openaccess.thecvf.com
We focus on multi-modal fusion for egocentric action recognition, and propose a novel
architecture for multi-modal temporal-binding, ie the combination of modalities within a …

Listen to look: Action recognition by previewing audio

R Gao, TH Oh, K Grauman… - Proceedings of the …, 2020 - openaccess.thecvf.com
In the face of the video data deluge, today's expensive clip-level classifiers are increasingly
impractical. We propose a framework for efficient action recognition in untrimmed video that …

Audiovisual slowfast networks for video recognition

F Xiao, YJ Lee, K Grauman, J Malik… - arXiv preprint arXiv …, 2020 - arxiv.org
We present Audiovisual SlowFast Networks, an architecture for integrated audiovisual
perception. AVSlowFast has Slow and Fast visual pathways that are deeply integrated with a …