[PDF][PDF] Semi-orthogonal low-rank matrix factorization for deep neural networks.

D Povey, G Cheng, Y Wang, K Li, H Xu… - Interspeech, 2018 - academia.edu
Abstract Time Delay Neural Networks (TDNNs), also known as onedimensional
Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural …

[HTML][HTML] Combining machine learning and process engineering physics towards enhanced accuracy and explainability of data-driven models

T Bikmukhametov, J Jäschke - Computers & Chemical Engineering, 2020 - Elsevier
Abstract Machine learning models are often considered as black-box solutions which is one
of the main reasons why they are still not widely used in operation of process engineering …

[PDF][PDF] Emotion Identification from Raw Speech Signals Using DNNs.

M Sarma, P Ghahremani, D Povey, NK Goel… - Interspeech, 2018 - danielpovey.com
We investigate a number of Deep Neural Network (DNN) architectures for emotion
identification with the IEMOCAP database. First we compare different feature extraction …

Low latency acoustic modeling using temporal convolution and LSTMs

V Peddinti, Y Wang, D Povey… - IEEE Signal Processing …, 2017 - ieeexplore.ieee.org
Bidirectional long short-term memory (BLSTM) acoustic models provide a significant word
error rate reduction compared to their unidirectional counterpart, as they model both the past …

The CAPIO 2017 conversational speech recognition system

KJ Han, A Chandrashekaran, J Kim, I Lane - arXiv preprint arXiv …, 2017 - arxiv.org
In this paper we show how we have achieved the state-of-the-art performance on the
industry-standard NIST 2000 Hub5 English evaluation set. We explore densely connected …

A teacher-student learning approach for unsupervised domain adaptation of sequence-trained asr models

V Manohar, P Ghahremani, D Povey… - 2018 IEEE Spoken …, 2018 - ieeexplore.ieee.org
Teacher-student (TS) learning is a transfer learning approach, where a teacher network is
used to “teach” a student network to make the same predictions as the teacher. Originally …

Pushing the boundaries of audiovisual word recognition using residual networks and LSTMs

T Stafylakis, MH Khan, G Tzimiropoulos - Computer Vision and Image …, 2018 - Elsevier
Visual and audiovisual speech recognition are witnessing a renaissance which is largely
due to the advent of deep learning methods. In this paper, we present a deep learning …

[HTML][HTML] Advances in subword-based HMM-DNN speech recognition across languages

P Smit, S Virpioja, M Kurimo - Computer Speech & Language, 2021 - Elsevier
We describe a novel way to implement subword language models in speech recognition
systems based on weighted finite state transducers, hidden Markov models, and deep …

Automatic lyrics transcription using dilated convolutional neural networks with self-attention

E Demirel, S Ahlbäck, S Dixon - 2020 International Joint …, 2020 - ieeexplore.ieee.org
Speech recognition is a well developed research field so that the current state of the art
systems are being used in many applications in the software industry, yet as by today, there …

Novel simulation of aqueous total nitrogen and phosphorus concentrations in Taihu Lake with machine learning

H Lu, L Yang, Y Fan, X Qian, T Liu - Environmental Research, 2022 - Elsevier
This study demonstrates the utility of internal nutrient loads as an additional parameter to
improve the performance of machine learning models in predicting the temporal variations of …