A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Google usm: Scaling automatic speech recognition beyond 100 languages
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …
Beats: Audio pre-training with acoustic tokenizers
The massive growth of self-supervised learning (SSL) has been witnessed in language,
vision, speech, and audio domains over the past few years. While discrete label prediction is …
vision, speech, and audio domains over the past few years. While discrete label prediction is …
Audiopalm: A large language model that can speak and listen
PK Rubenstein, C Asawaroengchai… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce AudioPaLM, a large language model for speech understanding and
generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil …
generation. AudioPaLM fuses text-based and speech-based language models, PaLM-2 [Anil …
Soundstorm: Efficient parallel audio generation
We present SoundStorm, a model for efficient, non-autoregressive audio generation.
SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional …
SoundStorm receives as input the semantic tokens of AudioLM, and relies on bidirectional …
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Prompting large language models with speech recognition abilities
Large language models (LLMs) have proven themselves highly flexible, able to solve a wide
range of generative tasks, such as abstractive summarization and open-ended question …
range of generative tasks, such as abstractive summarization and open-ended question …
Seamless: Multilingual Expressive and Streaming Speech Translation
Large-scale automatic speech translation systems today lack key features that help machine-
mediated communication feel seamless when compared to human-to-human dialogue. In …
mediated communication feel seamless when compared to human-to-human dialogue. In …
Conditional adapters: Parameter-efficient transfer learning with fast inference
Abstract We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning
method that also improves inference efficiency. CoDA generalizes beyond standard adapter …
method that also improves inference efficiency. CoDA generalizes beyond standard adapter …