Learning in audio-visual context: A review, analysis, and new perspective

Y Wei, D Hu, Y Tian, X Li - arXiv preprint arXiv:2208.09579, 2022 - arxiv.org
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …

Vision transformers are parameter-efficient audio-visual learners

YB Lin, YL Sung, J Lei, M Bansal… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers (ViTs) have achieved impressive results on various computer vision
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …

Binding touch to everything: Learning unified multimodal tactile representations

F Yang, C Feng, Z Chen, H Park… - Proceedings of the …, 2024 - openaccess.thecvf.com
The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …

Audio-visual class-incremental learning

W Pian, S Mo, Y Guo, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In this paper, we introduce audio-visual class-incremental learning, a class-incremental
learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual …

Self-supervised video forensics by audio-visual anomaly detection

C Feng, Z Chen, A Owens - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Manipulated videos often contain subtle inconsistencies between their visual and audio
signals. We propose a video forensics method, based on anomaly detection, that can …

Audio-visual grouping network for sound localization from mixtures

S Mo, Y Tian - Proceedings of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Sound source localization is a typical and challenging task that predicts the location of
sound sources in a video. Previous single-source methods mainly used the audio-visual …

Class-incremental grouping network for continual audio-visual learning

S Mo, W Pian, Y Tian - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Continual learning is a challenging problem in which models need to be trained on non-
stationary data across sequential tasks for class-incremental learning. While previous …

Auto-ACD: A large-scale dataset for audio-language representation learning

L Sun, X Xu, M Wu, W Xie - Proceedings of the 32nd ACM International …, 2024 - dl.acm.org
Recently, the AI community has made significant strides in developing powerful foundation
models, driven by large-scale multimodal datasets. However, for audio representation …

Egocentric audio-visual object localization

C Huang, Y Tian, A Kumar… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …

A survey of audio enhancement algorithms for music, speech, bioacoustics, biomedical, industrial and environmental sounds by image U-Net

S Gul, MS Khan - IEEE Access, 2023 - ieeexplore.ieee.org
The recent surge in the use of Deep Neural Networks (DNNs) has also made its mark in the
field of Audio Enhancement (AE), providing much better quality than the classical methods …