Visual attention methods in deep learning: An in-depth survey

M Hassanin, S Anwar, I Radwan, FS Khan, A Mian - Information Fusion, 2024 - Elsevier
Inspired by the human cognitive system, attention is a mechanism that imitates the human
cognitive awareness about specific information, amplifying critical details to focus more on …

A survey on deep learning for big data

Q Zhang, LT Yang, Z Chen, P Li - Information Fusion, 2018 - Elsevier
Deep learning, as one of the most currently remarkable machine learning techniques, has
achieved great success in many applications such as image analysis, speech recognition …

Convolutional, long short-term memory, fully connected deep neural networks

TN Sainath, O Vinyals, A Senior… - 2015 IEEE international …, 2015 - ieeexplore.ieee.org
Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have
shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech …

[PDF][PDF] Learning the speech front-end with raw waveform CLDNNs.

TN Sainath, RJ Weiss, AW Senior, KW Wilson… - Interspeech, 2015 - isca-archive.org
Learning an acoustic model directly from the raw waveform has been an active area of
research. However, waveformbased models have not yet matched the performance of …

Highway long short-term memory rnns for distant speech recognition

Y Zhang, G Chen, D Yu, K Yao… - … on acoustics, speech …, 2016 - ieeexplore.ieee.org
In this paper, we extend the deep long short-term memory (DL-STM) recurrent neural
networks by introducing gated direct connections between memory cells in adjacent layers …

Multichannel signal processing with deep neural networks for automatic speech recognition

TN Sainath, RJ Weiss, KW Wilson, B Li… - … on Audio, Speech …, 2017 - ieeexplore.ieee.org
Multichannel automatic speech recognition (ASR) systems commonly separate speech
enhancement, including localization, beamforming, and postfiltering, from acoustic …

Sparse overcomplete word vector representations

M Faruqui, Y Tsvetkov, D Yogatama, C Dyer… - arXiv preprint arXiv …, 2015 - arxiv.org
Current distributed representations of words show little resemblance to theories of lexical
semantics. The former are dense and uninterpretable, the latter largely based on familiar …

Scalable training of deep learning machines by incremental block training with intra-block parallel optimization and blockwise model-update filtering

K Chen, Q Huo - … conference on acoustics, speech and signal …, 2016 - ieeexplore.ieee.org
We present a new approach to scalable training of deep learning machines by incremental
block training with intra-block parallel optimization to leverage data parallelism and …

[PDF][PDF] Lower Frame Rate Neural Network Acoustic Models.

G Pundak, TN Sainath - Interspeech, 2016 - isca-archive.org
Recently neural network acoustic models trained with Connectionist Temporal Classification
(CTC) were proposed as an alternative approach to conventional cross-entropy trained …

[PDF][PDF] Neural network adaptive beamforming for robust multichannel speech recognition.

B Li, TN Sainath, RJ Weiss, KW Wilson, M Bacchiani - Interspeech, 2016 - isca-archive.org
Joint multichannel enhancement and acoustic modeling using neural networks has shown
promise over the past few years. However, one shortcoming of previous work [1, 2, 3] is that …