Gigaspeech: An evolving, multi-domain asr corpus with 10,000 hours of transcribed audio
This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …
Findings of the IWSLT 2022 Evaluation Campaign.
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …
Prompting large language models for zero-shot domain adaptation in speech recognition
The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …
domain shifts in speech recognition. However, these approaches usually require a …
Augmented datasheets for speech datasets and ethical decision-making
Speech datasets are crucial for training Speech Language Technologies (SLT); however,
the lack of diversity of the underlying training data can lead to serious limitations in building …
the lack of diversity of the underlying training data can lead to serious limitations in building …
Reproducing whisper-style training using an open-source toolkit and publicly available data
Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …
Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study
Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …
OWSM v3. 1: Better and faster open whisper-style speech models based on e-branchformer
Recent studies have advocated for fully open foundation models to promote transparency
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …
How might we create better benchmarks for speech recognition?
A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org
The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …
due to recent significant quality improvements. However, as recent work indicates, even …
A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding
Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …
and time-consuming. Recent studies achieved promising results by using pre-trained …
Adapting large language model with speech for fully formatted end-to-end speech recognition
Most end-to-end (E2E) speech recognition models are composed of encoder and decoder
blocks that perform acoustic and language modeling functions. Pretrained large language …
blocks that perform acoustic and language modeling functions. Pretrained large language …