The 2020 espnet update: new features, broadened applications, performance improvements, and...

S Khurana, A Laurent, J Glass - IEEE Journal of Selected …, 2022 - ieeexplore.ieee.org

We propose the (): S emantically-A ligned M ultimodal U tterance-level Cross-L ingual S
peech R epresentation learning framework. Unlike previous works on speech representation …

被引用次数：35 相关文章所有 6 个版本

[PDF] arxiv.org

A comparative study on non-autoregressive modelings for speech-to-text generation

Y Higuchi, N Chen, Y Fujita, H Inaguma… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence,
which significantly reduces the inference speed at the cost of accuracy drop compared to …

被引用次数：47 相关文章所有 6 个版本

[PDF] arxiv.org

Investigating self-supervised pretraining frameworks for pathological speech recognition

LP Violeta, WC Huang, T Toda - arXiv preprint arXiv:2203.15431, 2022 - arxiv.org

We investigate the performance of self-supervised pretraining frameworks on pathological
speech datasets used for automatic speech recognition (ASR). Modern end-to-end models …

被引用次数：30 相关文章所有 5 个版本

[PDF] arxiv.org

Anonymizing speech with generative adversarial networks to preserve speaker privacy

S Meyer, P Tilli, P Denisov, F Lux… - 2022 IEEE Spoken …, 2023 - ieeexplore.ieee.org

In order to protect the privacy of speech data, speaker anonymization aims for hiding the
identity of a speaker by changing the voice in speech recordings. This typically comes with a …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Speaker anonymization with phonetic intermediate representations

S Meyer, F Lux, P Denisov, J Koch, P Tilli… - arXiv preprint arXiv …, 2022 - arxiv.org

In this work, we propose a speaker anonymization pipeline that leverages high quality
automatic speech recognition and synthesis systems to generate speech conditioned on …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Improving noise robustness of contrastive speech representation learning with speech reconstruction

H Wang, Y Qian, X Wang, Y Wang… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Noise robustness is essential for deploying automatic speech recognition (ASR) systems in
real-world environments. One way to reduce the effect of noise interference is to employ a …

被引用次数：28 相关文章所有 3 个版本

[PDF] mdpi.com

Mispronunciation detection and diagnosis with articulatory-level feedback generation for non-native arabic speech

M Algabri, H Mathkour, M Alsulaiman, MA Bencherif - Mathematics, 2022 - mdpi.com

A high-performance versatile computer-assisted pronunciation training (CAPT) system that
provides the learner immediate feedback as to whether their pronunciation is correct is very …

被引用次数：19 相关文章所有 7 个版本

[PDF] arxiv.org

Magic dust for cross-lingual adaptation of monolingual wav2vec-2.0

S Khurana, A Laurent, J Glass - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org

We propose a simple and effective cross-lingual transfer learning method to adapt
monolingual wav2vec-2.0 models for Automatic Speech Recognition (ASR) in resource …

被引用次数：25 相关文章所有 11 个版本

[PDF] arxiv.org

Human listening and live captioning: Multi-task training for speech enhancement

SE Eskimez, X Wang, M Tang, H Yang, Z Zhu… - arXiv preprint arXiv …, 2021 - arxiv.org

With the surge of online meetings, it has become more critical than ever to provide high-
quality speech audio and live captioning under various noise conditions. However, most …

被引用次数：26 相关文章所有 5 个版本

[PDF] arxiv.org

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification

S Takamichi, L Kürzinger, T Saeki, S Shiota… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we construct a new Japanese speech corpus called" JTubeSpeech." Although
recent end-to-end learning requires large-size speech corpora, open-sourced such corpora …

被引用次数：20 相关文章所有 2 个版本