Retrieval-augmented generation for ai-generated content: A survey

P Zhao, H Zhang, Q Yu, Z Wang, Y Geng, F Fu… - arXiv preprint arXiv …, 2024 - arxiv.org
The development of Artificial Intelligence Generated Content (AIGC) has been facilitated by
advancements in model algorithms, scalable foundation model architectures, and the …

Zero-Shot Audio Captioning Using Soft and Hard Prompts

Y Zhang, X Xu, R Du, H Liu, Y Dong, ZH Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
In traditional audio captioning methods, a model is usually trained in a fully supervised
manner using a human-annotated dataset containing audio-text pairs and then evaluated on …

EDTC: enhance depth of text comprehension in automated audio captioning

L Tan, Y Cao, Y Zhou - arXiv preprint arXiv:2402.17259, 2024 - arxiv.org
Modality discrepancies have perpetually posed significant challenges within the realm of
Automated Audio Captioning (AAC) and across all multi-modal domains. Facilitating models …

Dissecting Temporal Understanding in Text-to-Audio Retrieval

AM Oncescu, JF Henriques, AS Koepke - ACM Multimedia 2024 - openreview.net
Recent advancements in machine learning have fueled research on multimodal interactions,
such as for instance text-to-video and text-to-audio retrieval tasks. These tasks require …

Domain Adaptation for Contrastive Audio-Language Models

S Deshmukh, R Singh, B Raj - arXiv preprint arXiv:2402.09585, 2024 - arxiv.org
Audio-Language Models (ALM) aim to be general-purpose audio models by providing zero-
shot capabilities at test time. The zero-shot performance of ALM improves by using suitable …