A comprehensive survey of automated audio captioning

X Xu, M Wu, K Yu - arXiv preprint arXiv:2205.05357, 2022 - arxiv.org
Automated audio captioning, a task that mimics human perception as well as innovatively
links audio processing and natural language processing, has overseen much progress over …

An encoder-decoder based audio captioning system with transfer and reinforcement learning

X Mei, Q Huang, X Liu, G Chen, J Wu, Y Wu… - arXiv preprint arXiv …, 2021 - arxiv.org
Automated audio captioning aims to use natural language to describe the content of audio
data. This paper presents an audio captioning system with an encoder-decoder architecture …

ACTUAL: Audio captioning with caption feature space regularization

Y Zhang, H Yu, R Du, ZH Tan, W Wang… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
Audio captioning aims at describing the content of audio clips with human language. Due to
the ambiguity of audio content, different people may perceive the same audio clip differently …

[PDF][PDF] A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.

X Xu, H Dinkel, M Wu, K Yu - DCASE, 2020 - myw19.github.io
Audio captioning aims at generating a natural sentence to describe the content in an audio
clip. This paper proposes the use of a powerful CRNN encoder combined with a GRU …

Improving the performance of automated audio captioning via integrating the acoustic and semantic information

Z Ye, H Wang, D Yang, Y Zou - arXiv preprint arXiv:2110.06100, 2021 - arxiv.org
Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic
signal processing and natural language processing to generate human-readable sentences …

[PDF][PDF] The SJTU system for DCASE2021 challenge task 6: Audio captioning based on encoder pre-training and reinforcement learning

X Xu, Z Xie, M Wu, K Yu - Proc. Conf. Detection Classification …, 2021 - dcase.community
This report proposes an audio captioning system for the Detection and Classification of
Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6. Our audio captioning …

Wavetransformer: A novel architecture for audio captioning based on learning temporal and time-frequency information

A Tran, K Drossos, T Virtanen - arXiv preprint arXiv:2010.11098, 2020 - arxiv.org
Automated audio captioning (AAC) is a novel task, where a method takes as an input an
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …

Audio caption in a car setting with a sentence-level loss

X Xu, H Dinkel, M Wu, K Yu - 2021 12th International …, 2021 - ieeexplore.ieee.org
Captioning has attracted much attention in image and video understanding while a small
amount of work examines audio captioning. This paper contributes a Mandarin-annotated …

Wavetransformer: An architecture for audio captioning based on learning temporal and time-frequency information

A Tran, K Drossos, T Virtanen - 2021 29th European Signal …, 2021 - ieeexplore.ieee.org
Automated audio captioning (AAC) is a novel task, where a method takes as an input an
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …

[PDF][PDF] Audio captioning using pre-trained model and data augmentation

T Huang, C Pan, W Chen, C Zhu, S Li, X Shao - 2022 - dcase.community
This technical report describes an automatic audio captioning system for task 6, Detection
and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge. Based on an …