[PDF][PDF] Recent advances in end-to-end automatic speech recognition
J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com
Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …
Contextual adapters for personalized speech recognition in neural transducers
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR)
models is a challenge due to the lack of training data. A standard way to address this issue …
models is a challenge due to the lack of training data. A standard way to address this issue …
Improving end-to-end contextual speech recognition with fine-grained contextual knowledge selection
Nowadays, most methods for end-to-end contextual speech recognition bias the recognition
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …
process towards contextual knowledge. Since all-neural contextual biasing methods rely on …
Efficient domain adaptation for speech foundation models
Foundation models (FMs), that are trained on broad data at scale and are adaptable to a
wide range of downstream tasks, have brought large interest in the research community …
wide range of downstream tasks, have brought large interest in the research community …
Cosmic: Data efficient instruction-tuning for speech in-context learning
We present a data and cost efficient way of incorporating the speech modality into a large
language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with …
language model (LLM). The resulting multi-modal LLM is a COntextual Speech Model with …
End-to-end speech recognition contextualization with large language models
In recent years, Large Language Models (LLMs) have garnered significant attention from the
research community due to their exceptional performance and generalization capabilities. In …
research community due to their exceptional performance and generalization capabilities. In …
Tree-constrained pointer generator for end-to-end contextual speech recognition
Contextual knowledge is important for real-world automatic speech recognition (ASR)
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
applications. In this paper, a novel tree-constrained pointer generator (TCPGen) component …
Contextualized end-to-end speech recognition with contextual phrase prediction network
Contextual information plays a crucial role in speech recognition technologies and
incorporating it into the end-to-end speech recognition models has drawn immense interest …
incorporating it into the end-to-end speech recognition models has drawn immense interest …
Listen, know and spell: Knowledge-infused subword modeling for improving asr performance of oov named entities
Automatic speech recognition (ASR) is increasingly being used in specialized domains such
as medical ASR and news transcription. Owing to the lack of high quality annotated speech …
as medical ASR and news transcription. Owing to the lack of high quality annotated speech …
Can contextual biasing remain effective with Whisper and GPT-2?
End-to-end automatic speech recognition (ASR) and large language models, such as
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …
Whisper and GPT-2, have recently been scaled to use vast amounts of training data. Despite …