Enhanced attention tracking with multi-branch network for egocentric activity recognition

T Liu, KM Lam, R Zhao, J Kong - IEEE Transactions on Circuits …, 2021 - ieeexplore.ieee.org
The emergence of wearable devices has opened up new potentials for egocentric activity
recognition. Although some methods integrate attention mechanisms into deep neural …

Erm: Energy-based refined-attention mechanism for video question answering

F Zhang, R Wang, F Zhou, Y Luo - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Spatiotemporal attention learning remains a challenging video question answering
(VideoQA) task as it requires a sufficient understanding of cross-modal spatiotemporal …

Watch, listen, and answer: Open-ended VideoQA with modulated multi-stream 3d ConvNets

T Miyanishi, M Kawanabe - 2021 29th European Signal …, 2021 - ieeexplore.ieee.org
We propose an open-ended multimodal video question answering (VideoQA) method that
predicts textual answers by referring to multimodal information derived from videos. Most …

Fusing Temporally Distributed Multi-Modal Semantic Clues for Video Question Answering

F Zhang, R Wang, S Xu, F Zhou - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Video Question Answering (VideoQA) is an intriguing topic, attracting increasing interest
among the broad AI community. Yet videoQA is a difficult task. An algorithm competently …

[PDF][PDF] Spatio-Temporal Attention Mechanisms in Video-Based Visual Question Answering: A Comprehensive Review

DSVP Ritvik, ASV Praneel, P Ramaiah - jcse.cloud
Video Question Answering (VideoQA) has become an essential job in computer vision and
natural language processing interfaces, requiring models incorporating spatio-temporal …

マルチストリーム3 次元畳み込みネットワークによる外観・動作・音声情報を統合した映像質問応答

宮西大樹, 川鍋一晃 - 人工知能学会全国大会論文集第35 回(2021), 2021 - jstage.jst.go.jp
抄録 本研究では, 外観・動作・音声情報を同時に用いて, 映像に対する質問に回答するオープン
エンド型のマルチモーダル映像質問応答手法を提案する. 音声情報は映像コンテンツを理解する …