End-to-end speech recognition and keyword search on low-resource languages

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：417 相关文章所有 7 个版本

[PDF] ieee.org

Acoustic modeling based on deep learning for low-resource speech recognition: An overview

C Yu, M Kang, Y Chen, J Wu, X Zhao - IEEE Access, 2020 - ieeexplore.ieee.org

The polarization of world languages is becoming more and more obvious. Many languages,
mainly endangered languages, are of low-resource attribute due to lack of information. Both …

被引用次数：34 相关文章所有 2 个版本

[PDF] arxiv.org

Multilingual sequence-to-sequence speech recognition: architecture, transfer learning, and language modeling

J Cho, MK Baskar, R Li, M Wiesner… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org

Sequence-to-sequence (seq2seq) approach for low-resource ASR is a relatively new
direction in speech research. The approach benefits by performing model training without …

被引用次数：151 相关文章所有 12 个版本

[PDF] arxiv.org

Deep lip reading: a comparison of models and an online application

T Afouras, JS Chung, A Zisserman - arXiv preprint arXiv:1806.06053, 2018 - arxiv.org

The goal of this paper is to develop state-of-the-art models for lip reading--visual speech
recognition. We develop three architectures and compare their accuracy and training …

被引用次数：142 相关文章所有 15 个版本

[PDF] thecvf.com

Mixspeech: Cross-modality self-learning with audio-visual stream mixup for visual speech translation and recognition

X Cheng, T Jin, R Huang, L Li, W Lin… - Proceedings of the …, 2023 - openaccess.thecvf.com

Multi-media communications facilitate global interaction among people. However, despite
researchers exploring cross-lingual translation techniques such as machine translation and …

被引用次数：23 相关文章所有 6 个版本

[PDF] arxiv.org

Attention-based end-to-end models for small-footprint keyword spotting

C Shan, J Zhang, Y Wang, L Xie - arXiv preprint arXiv:1803.10916, 2018 - arxiv.org

In this paper, we propose an attention-based end-to-end neural approach for small-footprint
keyword spotting (KWS), which aims to simplify the pipelines of building a production-quality …

被引用次数：129 相关文章所有 7 个版本

[PDF] arxiv.org

Streaming small-footprint keyword spotting using sequence-to-sequence models

Y He, R Prabhavalkar, K Rao, W Li… - 2017 IEEE Automatic …, 2017 - ieeexplore.ieee.org

We develop streaming keyword spotting systems using a recurrent neural network
transducer (RNN-T) model: an all-neural, end-to-end trained, sequence-to-sequence model …

被引用次数：105 相关文章所有 7 个版本

[PDF] arxiv.org

Seeing wake words: Audio-visual keyword spotting

L Momeni, T Afouras, T Stafylakis, S Albanie… - arXiv preprint arXiv …, 2020 - arxiv.org

The goal of this work is to automatically determine whether and when a word of interest is
spoken by a talking face, with or without the audio. We propose a zero-shot method suitable …

被引用次数：53 相关文章所有 7 个版本

[PDF] arxiv.org

End-to-end speech recognition from federated acoustic models

Y Gao, T Parcollet, S Zaiem… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Training Automatic Speech Recognition (ASR) models under federated learning (FL)
settings has attracted a lot of attention recently. However, the FL scenarios often presented …

被引用次数：45 相关文章所有 9 个版本

[PDF] arxiv.org

Language-agnostic multilingual modeling

A Datta, B Ramabhadran, J Emond… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

Multilingual Automated Speech Recognition (ASR) systems allow for the joint training of
data-rich and data-scarce languages in a single model. This enables data and parameter …

被引用次数：40 相关文章所有 5 个版本