Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Fairseq S2T: Fast speech-to-text modeling with fairseq

C Wang, Y Tang, X Ma, A Wu, S Popuri… - arXiv preprint arXiv …, 2020 - arxiv.org
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such
as end-to-end speech recognition and speech-to-text translation. It follows fairseq's careful …

High fidelity speech synthesis with adversarial networks

M Bińkowski, J Donahue, S Dieleman, A Clark… - arXiv preprint arXiv …, 2019 - arxiv.org
Generative adversarial networks have seen rapid development in recent years and have led
to remarkable improvements in generative modelling of images. However, their application …

Nemo: a toolkit for building ai applications using neural modules

O Kuchaiev, J Li, H Nguyen, O Hrinchuk… - arXiv preprint arXiv …, 2019 - arxiv.org
NeMo (Neural Modules) is a Python framework-agnostic toolkit for creating AI applications
through re-usability, abstraction, and composition. NeMo is built around neural modules …

Jasper: An end-to-end convolutional neural acoustic model

J Li, V Lavrukhin, B Ginsburg, R Leary… - arXiv preprint arXiv …, 2019 - arxiv.org
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech
recognition models without any external training data. Our model, Jasper, uses only 1D …

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

Transformers in speech processing: A survey

S Latif, A Zaidi, H Cuayahuitl, F Shamshad… - arXiv preprint arXiv …, 2023 - arxiv.org
The remarkable success of transformers in the field of natural language processing has
sparked the interest of the speech-processing community, leading to an exploration of their …

Wav2letter++: A fast open-source speech recognition system

V Pratap, A Hannun, Q Xu, J Cai, J Kahn… - ICASSP 2019-2019 …, 2019 - ieeexplore.ieee.org
This paper introduces wav2letter++, a fast open-source deep learning speech recognition
framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for …

ESPnet-ST: All-in-one speech translation toolkit

H Inaguma, S Kiyono, K Duh, S Karita… - arXiv preprint arXiv …, 2020 - arxiv.org
We present ESPnet-ST, which is designed for the quick development of speech-to-speech
translation systems in a single framework. ESPnet-ST is a new project inside end-to-end …

Learning robust and multilingual speech representations

K Kawakami, L Wang, C Dyer, P Blunsom… - arXiv preprint arXiv …, 2020 - arxiv.org
Unsupervised speech representation learning has shown remarkable success at finding
representations that correlate with phonetic structures and improve downstream speech …