Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
We propose Sortformer, a novel neural model for speaker diarization, trained with
unconventional objectives compared to existing end-to-end diarization models. The …
unconventional objectives compared to existing end-to-end diarization models. The …
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
Self-supervised learning has been proved to benefit a wide range of speech processing
tasks, such as speech recognition/translation, speaker verification and diarization, etc …
tasks, such as speech recognition/translation, speaker verification and diarization, etc …
Chain-of-Thought Prompting for Speech Translation
Large language models (LLMs) have demonstrated remarkable advancements in language
understanding and generation. Building on the success of text-based LLMs, recent research …
understanding and generation. Building on the success of text-based LLMs, recent research …
EMMeTT: Efficient Multimodal Machine Translation Training
A rising interest in the modality extension of foundation language models warrants
discussion on the most effective, and efficient, multimodal training approach. This work …
discussion on the most effective, and efficient, multimodal training approach. This work …
ASR Benchmarking: Need for a More Representative Conversational Dataset
G Maheshwari, D Ivanov, T Johannet… - arXiv preprint arXiv …, 2024 - arxiv.org
Automatic Speech Recognition (ASR) systems have achieved remarkable performance on
widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do …
widely used benchmarks such as LibriSpeech and Fleurs. However, these benchmarks do …