Salmonn: Towards generic hearing abilities for large language models

C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
Hearing is arguably an essential ability of artificial intelligence (AI) agents in the physical
world, which refers to the perception and understanding of general auditory information …

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arXiv preprint arXiv …, 2023 - arxiv.org
This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Connecting speech encoder and large language model for asr

W Yu, C Tang, G Sun, X Chen, T Tan… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
The impressive capability and versatility of large language models (LLMs) have aroused
increasing attention in automatic speech recognition (ASR), with several pioneering studies …

Can Whisper Perform Speech-Based In-Context Learning?

S Wang, CH Yang, J Wu… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
This paper investigates the in-context learning abilities of the Whisper automatic speech
recognition (ASR) models released by OpenAI. A novel speech-based in-context learning …

Salm: Speech-augmented language model with in-context learning for speech recognition and translation

Z Chen, H Huang, A Andrusenko… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We present a novel Speech Augmented Language Model (SALM) with multitask and in-
context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a …

Cosmic: Data efficient instruction-tuning for speech in-context learning

J Pan, J Wu, Y Gaur, S Sivasankaran, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a data and cost efficient way of incorporating the speech modality into a large
language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with …

End-to-end speech recognition contextualization with large language models

E Lakomkin, C Wu, Y Fathullah, O Kalinli… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …

Boosting large language model for speech synthesis: An empirical study

H Hao, L Zhou, S Liu, J Li, S Hu, R Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have made significant advancements in natural language
processing and are concurrently extending the language ability to other modalities, such as …

Exploring autonomous agents through the lens of large language models: A review

S Barua - arXiv preprint arXiv:2404.04442, 2024 - arxiv.org
Large Language Models (LLMs) are transforming artificial intelligence, enabling
autonomous agents to perform diverse tasks across various domains. These agents …

Transllama: Llm-based simultaneous translation system

R Koshkin, K Sudoh, S Nakamura - arXiv preprint arXiv:2402.04636, 2024 - arxiv.org
Decoder-only large language models (LLMs) have recently demonstrated impressive
capabilities in text generation and reasoning. Nonetheless, they have limited applications in …