A pitch extraction algorithm tuned for automatic speech recognition

S Karita, N Chen, T Hayashi, T Hori… - 2019 IEEE automatic …, 2019 - ieeexplore.ieee.org

Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …

被引用次数：836 相关文章所有 10 个版本

[PDF] arxiv.org

Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline

H Bu, J Du, X Na, B Wu, H Zheng - … of the oriental chapter of the …, 2017 - ieeexplore.ieee.org

An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …

被引用次数：845 相关文章所有 5 个版本

[PDF] isca-archive.org

[PDF][PDF] Audio augmentation for speech recognition.

T Ko, V Peddinti, D Povey, S Khudanpur - Interspeech, 2015 - isca-archive.org

Data augmentation is a common strategy adopted to increase the quantity of training data,
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …

被引用次数：1389 相关文章所有 10 个版本

[PDF] reading.ac.uk

Pattern mining approaches used in sensor-based biometric recognition: a review

J Chaki, N Dey, F Shi, RS Sherratt - IEEE Sensors Journal, 2019 - ieeexplore.ieee.org

Sensing technologies place significant interest in the use of biometrics for the recognition
and assessment of individuals. Pattern mining techniques have established a critical step in …

被引用次数：74 相关文章所有 6 个版本

[PDF] arxiv.org

Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM

T Hori, S Watanabe, Y Zhang, W Chan - arXiv preprint arXiv:1706.02737, 2017 - arxiv.org

We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We
learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) …

被引用次数：355 相关文章所有 13 个版本

[PDF] arxiv.org

VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature

C Du, Y Guo, X Chen, K Yu - arXiv preprint arXiv:2204.00768, 2022 - arxiv.org

The mainstream neural text-to-speech (TTS) pipeline is a cascade system, including an
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …

被引用次数：58 相关文章所有 5 个版本

[PDF] mdpi.com

Automatic speech recognition method based on deep learning approaches for Uzbek language

A Mukhamadiyev, I Khujayarov, O Djuraev, J Cho - Sensors, 2022 - mdpi.com

Communication has been an important aspect of human life, civilization, and globalization
for thousands of years. Biometric analysis, education, security, healthcare, and smart cities …

被引用次数：58 相关文章所有 10 个版本

[PDF] merl.com

Language independent end-to-end architecture for joint language identification and speech recognition

S Watanabe, T Hori, JR Hershey - 2017 IEEE Automatic Speech …, 2017 - ieeexplore.ieee.org

End-to-end automatic speech recognition (ASR) can significantly reduce the burden of
developing ASR systems for new languages, by eliminating the need for linguistic …

被引用次数：176 相关文章所有 7 个版本

[PDF] aaai.org

UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding

C Du, Y Guo, F Shen, Z Liu, Z Liang, X Chen… - Proceedings of the …, 2024 - ojs.aaai.org

The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens,
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …

被引用次数：30 相关文章所有 4 个版本

[PDF] arxiv.org

Emotion recognition by fusing time synchronous and time asynchronous representations

W Wu, C Zhang, PC Woodland - ICASSP 2021-2021 IEEE …, 2021 - ieeexplore.ieee.org

In this paper, a novel two-branch neural network model structure is proposed for multimodal
emotion recognition, which consists of a time synchronous branch (TSB) and a time …

被引用次数：62 相关文章所有 4 个版本