Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
A comprehensive survey on video saliency detection with auditory information: the audio-visual consistency perceptual is the key!
Video saliency detection (VSD) aims at fast locating the most attractive
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …
objects/things/patterns in a given video clip. Existing VSD-related works have mainly relied …
Video saliency forecasting transformer
Video saliency prediction (VSP) aims to imitate eye fixations of humans. However, the
potential of this task has not been fully exploited since existing VSP methods only focus on …
potential of this task has not been fully exploited since existing VSP methods only focus on …
Transformer-based multi-scale feature integration network for video saliency prediction
Most cutting-edge video saliency prediction models rely on spatiotemporal features
extracted by 3D convolutions due to its local contextual cues acquirement ability. However …
extracted by 3D convolutions due to its local contextual cues acquirement ability. However …
Spatio-temporal self-attention network for video saliency prediction
3D convolutional neural networks have achieved promising results for video tasks in
computer vision, including video saliency prediction that is explored in this paper. However …
computer vision, including video saliency prediction that is explored in this paper. However …
CASP-Net: Rethinking video saliency prediction from an audio-visual consistency perceptual perspective
J Xiong, G Wang, P Zhang, W Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Incorporating the audio stream enables Video Saliency Prediction (VSP) to imitate the
selective attention mechanism of human brain. By focusing on the benefits of joint auditory …
selective attention mechanism of human brain. By focusing on the benefits of joint auditory …
ECANet: Explicit cyclic attention-based network for video saliency prediction
H Xue, M Sun, Y Liang - Neurocomputing, 2022 - Elsevier
Video saliency prediction has received increasing attention in the field of computer vision
research. How to model the spatio-temporal information in video frames is a key issue for …
research. How to model the spatio-temporal information in video frames is a key issue for …
Joint learning of audio–visual saliency prediction and sound source localization on multi-face videos
Visual and audio events simultaneously occur and both attract attention. However, most
existing saliency prediction works ignore the influence of audio and only consider vision …
existing saliency prediction works ignore the influence of audio and only consider vision …
Multi-scale spatiotemporal feature fusion network for video saliency prediction
Y Zhang, T Zhang, C Wu, R Tao - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Recently, video saliency prediction has attracted increasing attention, yet the improvement
of its accuracy is still subject to the insufficient use of multi-scale spatiotemporal features. To …
of its accuracy is still subject to the insufficient use of multi-scale spatiotemporal features. To …
CAD-contextual multi-modal alignment for dynamic AVQA
In the context of Audio Visual Question Answering (AVQA) tasks, the audio and visual
modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing …
modalities could be learnt on three levels: 1) Spatial, 2) Temporal, and 3) Semantic. Existing …