A comparative study on transformer vs rnn in speech applications
Sequence-to-sequence models have been widely used in end-to-end speech processing,
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
for example, automatic speech recognition (ASR), speech translation (ST), and text-to …
Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline
An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the
largest corpus which is suitable for conducting the speech recognition research and building …
largest corpus which is suitable for conducting the speech recognition research and building …
[PDF][PDF] Audio augmentation for speech recognition.
Data augmentation is a common strategy adopted to increase the quantity of training data,
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …
avoid overfitting and improve robustness of the models. In this paper, we investigate audio …
Pattern mining approaches used in sensor-based biometric recognition: a review
Sensing technologies place significant interest in the use of biometrics for the recognition
and assessment of individuals. Pattern mining techniques have established a critical step in …
and assessment of individuals. Pattern mining techniques have established a critical step in …
Advances in joint CTC-attention based end-to-end speech recognition with a deep CNN encoder and RNN-LM
We present a state-of-the-art end-to-end Automatic Speech Recognition (ASR) model. We
learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) …
learn to listen and write characters with a joint Connectionist Temporal Classification (CTC) …
VQTTS: High-fidelity text-to-speech synthesis with self-supervised VQ acoustic feature
The mainstream neural text-to-speech (TTS) pipeline is a cascade system, including an
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …
acoustic model (AM) that predicts acoustic feature from the input transcript and a vocoder …
Automatic speech recognition method based on deep learning approaches for Uzbek language
Communication has been an important aspect of human life, civilization, and globalization
for thousands of years. Biometric analysis, education, security, healthcare, and smart cities …
for thousands of years. Biometric analysis, education, security, healthcare, and smart cities …
Language independent end-to-end architecture for joint language identification and speech recognition
End-to-end automatic speech recognition (ASR) can significantly reduce the burden of
developing ASR systems for new languages, by eliminating the need for linguistic …
developing ASR systems for new languages, by eliminating the need for linguistic …
UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding
The utilization of discrete speech tokens, divided into semantic tokens and acoustic tokens,
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …
has been proven superior to traditional acoustic feature mel-spectrograms in terms of …
Emotion recognition by fusing time synchronous and time asynchronous representations
In this paper, a novel two-branch neural network model structure is proposed for multimodal
emotion recognition, which consists of a time synchronous branch (TSB) and a time …
emotion recognition, which consists of a time synchronous branch (TSB) and a time …