A comprehensive survey of automated audio captioning
Automated audio captioning, a task that mimics human perception as well as innovatively
links audio processing and natural language processing, has overseen much progress over …
links audio processing and natural language processing, has overseen much progress over …
An encoder-decoder based audio captioning system with transfer and reinforcement learning
Automated audio captioning aims to use natural language to describe the content of audio
data. This paper presents an audio captioning system with an encoder-decoder architecture …
data. This paper presents an audio captioning system with an encoder-decoder architecture …
ACTUAL: Audio captioning with caption feature space regularization
Audio captioning aims at describing the content of audio clips with human language. Due to
the ambiguity of audio content, different people may perceive the same audio clip differently …
the ambiguity of audio content, different people may perceive the same audio clip differently …
[PDF][PDF] A CRNN-GRU Based Reinforcement Learning Approach to Audio Captioning.
Audio captioning aims at generating a natural sentence to describe the content in an audio
clip. This paper proposes the use of a powerful CRNN encoder combined with a GRU …
clip. This paper proposes the use of a powerful CRNN encoder combined with a GRU …
Improving the performance of automated audio captioning via integrating the acoustic and semantic information
Automated audio captioning (AAC) has developed rapidly in recent years, involving acoustic
signal processing and natural language processing to generate human-readable sentences …
signal processing and natural language processing to generate human-readable sentences …
[PDF][PDF] The SJTU system for DCASE2021 challenge task 6: Audio captioning based on encoder pre-training and reinforcement learning
This report proposes an audio captioning system for the Detection and Classification of
Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6. Our audio captioning …
Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6. Our audio captioning …
Wavetransformer: A novel architecture for audio captioning based on learning temporal and time-frequency information
Automated audio captioning (AAC) is a novel task, where a method takes as an input an
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …
Audio caption in a car setting with a sentence-level loss
Captioning has attracted much attention in image and video understanding while a small
amount of work examines audio captioning. This paper contributes a Mandarin-annotated …
amount of work examines audio captioning. This paper contributes a Mandarin-annotated …
Wavetransformer: An architecture for audio captioning based on learning temporal and time-frequency information
Automated audio captioning (AAC) is a novel task, where a method takes as an input an
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …
audio sample and outputs a textual description (ie a caption) of its contents. Most AAC …
[PDF][PDF] Audio captioning using pre-trained model and data augmentation
This technical report describes an automatic audio captioning system for task 6, Detection
and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge. Based on an …
and Classification of Acoustic Scenes and Events (DCASE) 2022 Challenge. Based on an …