Beyond the status quo: A contemporary survey of advances and challenges in audio captioning

X Xu, Z Xie, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org
Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …

CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

E Labb, T Pellegrini, J Pinquier - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Automated Audio Captioning (AAC) involves generating natural language descriptions of
audio content, using encoder-decoder architectures. An audio encoder produces audio …

Diffusion-based diverse audio captioning with retrieval-guided Langevin dynamics

Y Zhu, A Men, L Xiao - Information Fusion, 2025 - Elsevier
Audio captioning, a comprehensive task of audio understanding, aims to provide a natural-
language description of an audio clip. Beyond accuracy, diversity is also a critical …

Audio Difference Learning for Audio Captioning

T Komatsu, Y Fujita, K Takeda… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This study introduces a novel training paradigm, audio difference learning, for improving
audio captioning. The fundamental concept of the proposed learning method is to create a …

Synth-ac: Enhancing audio captioning with synthetic supervision

F Xiao, Q Zhu, J Guan, X Liu, H Liu, K Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Data-driven approaches hold promise for audio captioning. However, the development of
audio captioning methods can be biased due to the limited availability and quality of text …

Multilingual Audio Captioning using machine translated data

M Cousin, E Labbé, T Pellegrini - arXiv preprint arXiv:2309.07615, 2023 - arxiv.org
Automated Audio Captioning (AAC) systems attempt to generate a natural language
sentence, a caption, that describes the content of an audio recording, in terms of sound …

[PDF][PDF] Sjtu-thu automated audio captioning system for dcase 2024

W Chen, X Li, Z Ma, Y Liang, A Jiang, Z Zheng, Y Qian… - 2024 - dcase.community
ABSTRACT Task 6 (Automated Audio Captioning) of the DCASE 2024 Challenge requires
the automatic creation of textual descriptions for general audio signals. This technical report …

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

X Xu, H Liu, M Wu, W Wang, MD Plumbley - arXiv preprint arXiv …, 2024 - arxiv.org
Significant improvement has been achieved in automated audio captioning (AAC) with
recent models. However, these models have become increasingly large as their …

[PDF][PDF] Automatic audio captioning with encoder fusion, multi-layer aggregation, and large language model enriched summarization

J Jung, D Zhang, CHH Yang, SL Wu, DM Chan, Z Kong… - 2024 - dcase.community
In this report, we describe our submission to Track 6 of the DCASE 2024 challenge for the
task of Automated Audio Captioning (AAC). The submitted models utilize an encoder …

Killing two birds with a stone: can an audio captioning system also be used for audio-test retrieval?

E Labbé, T Pellegrini, J Pinquier - 8th workshop on Detection and …, 2023 - hal.science
Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio
recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to …