End-to-end speech recognition: A survey
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …
learning has brought considerable reductions in word error rate of more than 50% relative …
Deep learning for audio signal processing
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
Speaker recognition from raw waveform with sincnet
M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …
speaker recognition. Promising results have been recently obtained with Convolutional …
[PDF][PDF] Wavenet: A generative model for raw audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
Wavenet: A generative model for raw audio
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
The model is fully probabilistic and autoregressive, with the predictive distribution for each …
Unsupervised speech representation learning using wavenet autoencoders
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
speech by applying autoencoding neural networks to speech waveforms. The goal is to …
[图书][B] Automatic speech recognition
Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …
interaction, has been an intensive research area for decades. Many core technologies, such …
Very deep convolutional neural networks for raw waveforms
Learning acoustic models directly from the raw waveform data with minimal processing is
challenging. Current waveform-based models have generally used very few (~ 2) …
challenging. Current waveform-based models have generally used very few (~ 2) …
[PDF][PDF] Learning the speech front-end with raw waveform CLDNNs.
Learning an acoustic model directly from the raw waveform has been an active area of
research. However, waveformbased models have not yet matched the performance of …
research. However, waveformbased models have not yet matched the performance of …
The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition …
The discriminative lexicon is introduced as a mathematical and computational model of the
mental lexicon. This novel theory is inspired by word and paradigm morphology but …
mental lexicon. This novel theory is inspired by word and paradigm morphology but …