Beyond the status quo: A contemporary survey of advances and challenges in audio captioning
Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …
innovatively links audio processing and natural language processing, has overseen much …
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Automated Audio Captioning (AAC) involves generating natural language descriptions of
audio content, using encoder-decoder architectures. An audio encoder produces audio …
audio content, using encoder-decoder architectures. An audio encoder produces audio …
Diffusion-based diverse audio captioning with retrieval-guided Langevin dynamics
Y Zhu, A Men, L Xiao - Information Fusion, 2025 - Elsevier
Audio captioning, a comprehensive task of audio understanding, aims to provide a natural-
language description of an audio clip. Beyond accuracy, diversity is also a critical …
language description of an audio clip. Beyond accuracy, diversity is also a critical …
Audio Difference Learning for Audio Captioning
This study introduces a novel training paradigm, audio difference learning, for improving
audio captioning. The fundamental concept of the proposed learning method is to create a …
audio captioning. The fundamental concept of the proposed learning method is to create a …
Synth-ac: Enhancing audio captioning with synthetic supervision
Data-driven approaches hold promise for audio captioning. However, the development of
audio captioning methods can be biased due to the limited availability and quality of text …
audio captioning methods can be biased due to the limited availability and quality of text …
Multilingual Audio Captioning using machine translated data
M Cousin, E Labbé, T Pellegrini - arXiv preprint arXiv:2309.07615, 2023 - arxiv.org
Automated Audio Captioning (AAC) systems attempt to generate a natural language
sentence, a caption, that describes the content of an audio recording, in terms of sound …
sentence, a caption, that describes the content of an audio recording, in terms of sound …
[PDF][PDF] Sjtu-thu automated audio captioning system for dcase 2024
ABSTRACT Task 6 (Automated Audio Captioning) of the DCASE 2024 Challenge requires
the automatic creation of textual descriptions for general audio signals. This technical report …
the automatic creation of textual descriptions for general audio signals. This technical report …
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Significant improvement has been achieved in automated audio captioning (AAC) with
recent models. However, these models have become increasingly large as their …
recent models. However, these models have become increasingly large as their …
[PDF][PDF] Automatic audio captioning with encoder fusion, multi-layer aggregation, and large language model enriched summarization
In this report, we describe our submission to Track 6 of the DCASE 2024 challenge for the
task of Automated Audio Captioning (AAC). The submitted models utilize an encoder …
task of Automated Audio Captioning (AAC). The submitted models utilize an encoder …
Killing two birds with a stone: can an audio captioning system also be used for audio-test retrieval?
Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio
recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to …
recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to …