A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Daspeech: Directed acyclic transformer for fast and high-quality speech-to-speech translation

Q Fang, Y Zhou, Y Feng - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Direct speech-to-speech translation (S2ST) translates speech from one language into
another using a single model. However, due to the presence of linguistic and acoustic …

Dub: Discrete unit back-translation for speech translation

D Zhang, R Ye, T Ko, M Wang, Y Zhou - arXiv preprint arXiv:2305.11411, 2023 - arxiv.org
How can speech-to-text translation (ST) perform as well as machine translation (MT)? The
key point is to bridge the modality gap between speech and text so that useful MT …

Speechmatrix: A large-scale mined corpus of multilingual speech-to-speech translations

PA Duquenne, H Gong, N Dong, J Du, A Lee… - arXiv preprint arXiv …, 2022 - arxiv.org
We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech
translations mined from real speech of European Parliament recordings. It contains speech …

Back translation for speech-to-text translation without transcripts

Q Fang, Y Feng - arXiv preprint arXiv:2305.08709, 2023 - arxiv.org
The success of end-to-end speech-to-text translation (ST) is often achieved by utilizing
source transcripts, eg, by pre-training with automatic speech recognition (ASR) and machine …

Multilingual speech-to-speech translation into multiple target languages

H Gong, N Dong, S Popuri, V Goswami, A Lee… - arXiv preprint arXiv …, 2023 - arxiv.org
Speech-to-speech translation (S2ST) enables spoken communication between people
talking in different languages. Despite a few studies on multilingual S2ST, their focus is the …

Can We Achieve High-quality Direct Speech-to-Speech Translation without Parallel Speech Data?

Q Fang, S Zhang, Z Ma, M Zhang, Y Feng - arXiv preprint arXiv …, 2024 - arxiv.org
Recently proposed two-pass direct speech-to-speech translation (S2ST) models decompose
the task into speech-to-text translation (S2TT) and text-to-speech (TTS) within an end-to-end …

Speech-to-speech Low-resource Translation

HC Liu, MY Day, CC Wang - 2023 IEEE 24th International …, 2023 - ieeexplore.ieee.org
Speech-to-speech translation (S2ST), particularly in the context of low-resource languages,
plays a vital role in facilitating global communication. However, comprehensive research in …

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

Y Wang, X Wu, D Wang, L Meng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric
speech into normal-sounding speech. The technology eases communication with speakers …

TranSentence: speech-to-speech Translation via Language-Agnostic Sentence-Level Speech Encoding without Language-Parallel Data

SB Kim, SH Lee, SW Lee - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Although there has been significant advancement in the field of speech-to-speech
translation, conventional models still require language-parallel speech data between the …