Less is more: Accurate speech recognition & translation without web-scale data

T Park, I Medennikov, K Dhawan, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks

H Huang, T Park, K Dhawan, I Medennikov… - arXiv preprint arXiv …, 2024 - arxiv.org

Self-supervised learning has been proved to benefit a wide range of speech processing
tasks, such as speech recognition/translation, speaker verification and diarization, etc …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Chain-of-Thought Prompting for Speech Translation

K Hu, Z Chen, CHH Yang, P Żelasko… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated remarkable advancements in language
understanding and generation. Building on the success of text-based LLMs, recent research …

EMMeTT: Efficient Multimodal Machine Translation Training

P Żelasko, Z Chen, M Wang, D Galvez… - arXiv preprint arXiv …, 2024 - arxiv.org

A rising interest in the modality extension of foundation language models warrants
discussion on the most effective, and efficient, multimodal training approach. This work …

[PDF] arxiv.org

ASR Benchmarking: Need for a More Representative Conversational Dataset

G Maheshwari, D Ivanov, T Johannet… - arXiv preprint arXiv …, 2024 - arxiv.org

Automatic Speech Recognition (ASR) systems have achieved remarkable performance on
widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do …