Learning in audio-visual context: A review, analysis, and new perspective
Sight and hearing are two senses that play a vital role in human communication and scene
understanding. To mimic human perception ability, audio-visual learning, aimed at …
understanding. To mimic human perception ability, audio-visual learning, aimed at …
Vision transformers are parameter-efficient audio-visual learners
Vision transformers (ViTs) have achieved impressive results on various computer vision
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …
Binding touch to everything: Learning unified multimodal tactile representations
The ability to associate touch with other modalities has huge implications for humans and
computational systems. However multimodal learning with touch remains challenging due to …
computational systems. However multimodal learning with touch remains challenging due to …
Audio-visual class-incremental learning
In this paper, we introduce audio-visual class-incremental learning, a class-incremental
learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual …
learning scenario for audio-visual video recognition. We demonstrate that joint audio-visual …
Self-supervised video forensics by audio-visual anomaly detection
Manipulated videos often contain subtle inconsistencies between their visual and audio
signals. We propose a video forensics method, based on anomaly detection, that can …
signals. We propose a video forensics method, based on anomaly detection, that can …
Audio-visual grouping network for sound localization from mixtures
S Mo, Y Tian - Proceedings of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Sound source localization is a typical and challenging task that predicts the location of
sound sources in a video. Previous single-source methods mainly used the audio-visual …
sound sources in a video. Previous single-source methods mainly used the audio-visual …
Class-incremental grouping network for continual audio-visual learning
Continual learning is a challenging problem in which models need to be trained on non-
stationary data across sequential tasks for class-incremental learning. While previous …
stationary data across sequential tasks for class-incremental learning. While previous …
Auto-ACD: A large-scale dataset for audio-language representation learning
Recently, the AI community has made significant strides in developing powerful foundation
models, driven by large-scale multimodal datasets. However, for audio representation …
models, driven by large-scale multimodal datasets. However, for audio representation …
Egocentric audio-visual object localization
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …
view. Likewise, machines are advanced to approach human intelligence by learning with …
A survey of audio enhancement algorithms for music, speech, bioacoustics, biomedical, industrial and environmental sounds by image U-Net
S Gul, MS Khan - IEEE Access, 2023 - ieeexplore.ieee.org
The recent surge in the use of Deep Neural Networks (DNNs) has also made its mark in the
field of Audio Enhancement (AE), providing much better quality than the classical methods …
field of Audio Enhancement (AE), providing much better quality than the classical methods …