End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

Speaker recognition from raw waveform with sincnet

M Ravanelli, Y Bengio - 2018 IEEE spoken language …, 2018 - ieeexplore.ieee.org
Deep learning is progressively gaining popularity as a viable alternative to i-vectors for
speaker recognition. Promising results have been recently obtained with Convolutional …

[PDF][PDF] Wavenet: A generative model for raw audio

A Van Den Oord, S Dieleman, H Zen… - arXiv preprint arXiv …, 2016 - academia.edu
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Wavenet: A generative model for raw audio

A Oord, S Dieleman, H Zen, K Simonyan… - arXiv preprint arXiv …, 2016 - arxiv.org
This paper introduces WaveNet, a deep neural network for generating raw audio waveforms.
The model is fully probabilistic and autoregressive, with the predictive distribution for each …

Unsupervised speech representation learning using wavenet autoencoders

J Chorowski, RJ Weiss, S Bengio… - … /ACM transactions on …, 2019 - ieeexplore.ieee.org
We consider the task of unsupervised extraction of meaningful latent representations of
speech by applying autoencoding neural networks to speech waveforms. The goal is to …

[图书][B] Automatic speech recognition

D Yu, L Deng - 2016 - Springer
Automatic Speech Recognition (ASR), which is aimed to enable natural human–machine
interaction, has been an intensive research area for decades. Many core technologies, such …

Very deep convolutional neural networks for raw waveforms

W Dai, C Dai, S Qu, J Li, S Das - 2017 IEEE international …, 2017 - ieeexplore.ieee.org
Learning acoustic models directly from the raw waveform data with minimal processing is
challenging. Current waveform-based models have generally used very few (~ 2) …

[PDF][PDF] Learning the speech front-end with raw waveform CLDNNs.

TN Sainath, RJ Weiss, AW Senior, KW Wilson… - Interspeech, 2015 - isca-archive.org
Learning an acoustic model directly from the raw waveform has been an active area of
research. However, waveformbased models have not yet matched the performance of …

The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de) composition …

RH Baayen, YY Chuang, E Shafaei-Bajestan… - …, 2019 - Wiley Online Library
The discriminative lexicon is introduced as a mathematical and computational model of the
mental lexicon. This novel theory is inspired by word and paradigm morphology but …