A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Daspeech: Directed acyclic transformer for fast and high-quality speech-to-speech translation
Direct speech-to-speech translation (S2ST) translates speech from one language into
another using a single model. However, due to the presence of linguistic and acoustic …
another using a single model. However, due to the presence of linguistic and acoustic …
Dub: Discrete unit back-translation for speech translation
How can speech-to-text translation (ST) perform as well as machine translation (MT)? The
key point is to bridge the modality gap between speech and text so that useful MT …
key point is to bridge the modality gap between speech and text so that useful MT …
Speechmatrix: A large-scale mined corpus of multilingual speech-to-speech translations
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech
translations mined from real speech of European Parliament recordings. It contains speech …
translations mined from real speech of European Parliament recordings. It contains speech …
Back translation for speech-to-text translation without transcripts
The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing
source transcripts, eg, by pre-training with automatic speech recognition (ASR) and machine …
source transcripts, eg, by pre-training with automatic speech recognition (ASR) and machine …
Multilingual speech-to-speech translation into multiple target languages
Speech-to-speech translation (S2ST) enables spoken communication between people
talking in different languages. Despite a few studies on multilingual S2ST, their focus is the …
talking in different languages. Despite a few studies on multilingual S2ST, their focus is the …
Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?
Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose
the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end …
the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end …
Speech-to-speech Low-resource Translation
Speech-to-speech translation (S2ST), particularly in the context of low-resource languages,
plays a vital role in facilitating global communication. However, comprehensive research in …
plays a vital role in facilitating global communication. However, comprehensive research in …
UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization
Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric
speech into normal-sounding speech. The technology eases communication with speakers …
speech into normal-sounding speech. The technology eases communication with speakers …
TranSentence: speech-to-speech Translation via Language-Agnostic Sentence-Level Speech Encoding without Language-Parallel Data
Although there has been significant advancement in the field of speech-to-speech
translation, conventional models still require language-parallel speech data between the …
translation, conventional models still require language-parallel speech data between the …