Spgispeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech...

G Chen, S Chai, G Wang, J Du, WQ Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition
corpus with 10,000 hours of high quality labeled audio suitable for supervised training, and …

被引用次数：186 相关文章所有 8 个版本

[PDF] fbk.eu

Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu

The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

被引用次数：97 相关文章所有 17 个版本

[PDF] arxiv.org

Prompting large language models for zero-shot domain adaptation in speech recognition

Y Li, Y Wu, J Li, S Liu - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org

The integration of Language Models (LMs) has proven to be an effective way to address
domain shifts in speech recognition. However, these approaches usually require a …

被引用次数：27 相关文章所有 4 个版本

[PDF] acm.org

Augmented datasheets for speech datasets and ethical decision-making

O Papakyriakopoulos, ASG Choi, W Thong… - Proceedings of the …, 2023 - dl.acm.org

Speech datasets are crucial for training Speech Language Technologies (SLT); however,
the lack of diversity of the underlying training data can lead to serious limitations in building …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Reproducing whisper-style training using an open-source toolkit and publicly available data

Y Peng, J Tian, B Yan, D Berrebbi… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org

Pre-training speech models on large volumes of data has achieved remarkable success.
OpenAI Whisper is a multilingual multitask model trained on 680k hours of supervised …

被引用次数：22 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

X Chang, B Yan, K Choi, JW Jung, Y Lu… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Speech signals, typically sampled at rates in the tens of thousands per second, contain
redundancies, evoking inefficiencies in sequence modeling. High-dimensional speech …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

OWSM v3. 1: Better and faster open whisper-style speech models based on e-branchformer

Y Peng, J Tian, W Chen, S Arora, B Yan, Y Sudo… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent studies have advocated for fully open foundation models to promote transparency
and open science. As an initial step, the Open Whisper-style Speech Model (OWSM) …

被引用次数：13 相关文章所有 2 个版本

[PDF] aclanthology.org

How might we create better benchmarks for speech recognition?

A Aksënova, D van Esch, J Flynn… - Proceedings of the 1st …, 2021 - aclanthology.org

The applications of automatic speech recognition (ASR) systems are proliferating, in part
due to recent significant quality improvements. However, as recent work indicates, even …

被引用次数：34 相关文章所有 9 个版本

[PDF] arxiv.org

A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding

Y Peng, S Arora, Y Higuchi, Y Ueda… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

Collecting sufficient labeled data for spoken language understanding (SLU) is expensive
and time-consuming. Recent studies achieved promising results by using pre-trained …

被引用次数：18 相关文章所有 5 个版本

[PDF] arxiv.org

Adapting large language model with speech for fully formatted end-to-end speech recognition

S Ling, Y Hu, S Qian, G Ye, Y Qian… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder
blocks that perform acoustic and language modeling functions. Pretrained large language …

被引用次数：9 相关文章所有 3 个版本