BEATs-based audio captioning model with INSTRUCTOR embedding supervision and ChatGPT mix-up

X Xu, Z Xie, M Wu, K Yu - IEEE/ACM Transactions on Audio …, 2023 - ieeexplore.ieee.org

Automated audio captioning (AAC), a task that mimics human perception as well as
innovatively links audio processing and natural language processing, has overseen much …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding

E Labb, T Pellegrini, J Pinquier - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org

Automated Audio Captioning (AAC) involves generating natural language descriptions of
audio content, using encoder-decoder architectures. An audio encoder produces audio …

被引用次数：7 相关文章所有 16 个版本

Diffusion-based diverse audio captioning with retrieval-guided Langevin dynamics

Y Zhu, A Men, L Xiao - Information Fusion, 2025 - Elsevier

Audio captioning, a comprehensive task of audio understanding, aims to provide a natural-
language description of an audio clip. Beyond accuracy, diversity is also a critical …

[PDF] arxiv.org

Audio Difference Learning for Audio Captioning

T Komatsu, Y Fujita, K Takeda… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

This study introduces a novel training paradigm, audio difference learning, for improving
audio captioning. The fundamental concept of the proposed learning method is to create a …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Synth-ac: Enhancing audio captioning with synthetic supervision

F Xiao, Q Zhu, J Guan, X Liu, H Liu, K Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Data-driven approaches hold promise for audio captioning. However, the development of
audio captioning methods can be biased due to the limited availability and quality of text …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Multilingual Audio Captioning using machine translated data

M Cousin, E Labbé, T Pellegrini - arXiv preprint arXiv:2309.07615, 2023 - arxiv.org

Automated Audio Captioning (AAC) systems attempt to generate a natural language
sentence, a caption, that describes the content of an audio recording, in terms of sound …

被引用次数：2 相关文章所有 15 个版本

[PDF] dcase.community

[PDF][PDF] Sjtu-thu automated audio captioning system for dcase 2024

W Chen, X Li, Z Ma, Y Liang, A Jiang, Z Zheng, Y Qian… - 2024 - dcase.community

ABSTRACT Task 6 (Automated Audio Captioning) of the DCASE 2024 Challenge requires
the automatic creation of textual descriptions for general audio signals. This technical report …

被引用次数：1 相关文章

[PDF] arxiv.org

Efficient Audio Captioning with Encoder-Level Knowledge Distillation

X Xu, H Liu, M Wu, W Wang, MD Plumbley - arXiv preprint arXiv …, 2024 - arxiv.org

Significant improvement has been achieved in automated audio captioning (AAC) with
recent models. However, these models have become increasingly large as their …

[PDF][PDF] Automatic audio captioning with encoder fusion, multi-layer aggregation, and large language model enriched summarization

J Jung, D Zhang, CHH Yang, SL Wu, DM Chan, Z Kong… - 2024 - dcase.community

In this report, we describe our submission to Track 6 of the DCASE 2024 challenge for the
task of Automated Audio Captioning (AAC). The submitted models utilize an encoder …

被引用次数：1 相关文章

[PDF] hal.science

Killing two birds with a stone: can an audio captioning system also be used for audio-test retrieval?

E Labbé, T Pellegrini, J Pinquier - 8th workshop on Detection and …, 2023 - hal.science

Automated Audio Captioning (AAC) aims to develop systems capable of describing an audio
recording using a textual sentence. In contrast, Audio-Text Retrieval (ATR) systems seek to …

被引用次数：1 相关文章所有 18 个版本