Tera: Self-supervised learning of transformer encoder representation for speech

AT Liu, SW Li, H Lee - IEEE/ACM Transactions on Audio …, 2021 - ieeexplore.ieee.org
We introduce a self-supervised speech pre-training method called TERA, which stands for
Transformer Encoder Representations from Alteration. Recent approaches often learn by …

Survey on deep neural networks in speech and vision systems

M Alam, MD Samad, L Vidyaratne, A Glandon… - Neurocomputing, 2020 - Elsevier
This survey presents a review of state-of-the-art deep neural network architectures,
algorithms, and systems in speech and vision applications. Recent advances in deep …

Thchs-30: A free chinese speech corpus

D Wang, X Zhang - arXiv preprint arXiv:1512.01882, 2015 - arxiv.org
Speech data is crucially important for speech recognition research. There are quite some
speech databases that can be purchased at prices that are reasonable for most research …

A review of shorthand systems: From brachygraphy to microtext and beyond

R Satapathy, E Cambria, A Nanetti, A Hussain - Cognitive Computation, 2020 - Springer
Human civilizations have performed the art of writing across continents and over different
time periods. In order to speed up the writing process, the art of shorthand (brachygraphy) …

[HTML][HTML] Sequence modeling with ctc

A Hannun - Distill, 2017 - distill.pub
Consider speech recognition. We have a dataset of audio clips and corresponding
transcripts. Unfortunately, we don't know how the characters in the transcript align to the …

Rnndrop: A novel dropout for rnns in asr

T Moon, H Choi, H Lee, I Song - 2015 IEEE Workshop on …, 2015 - ieeexplore.ieee.org
Recently, recurrent neural networks (RNN) have achieved the state-of-the-art performance
in several applications that deal with temporal data, eg, speech recognition, handwriting …

A survey of recent DNN architectures on the TIMIT phone recognition task

J Michalek, J Vaněk - Text, Speech, and Dialogue: 21st International …, 2018 - Springer
In this survey paper, we have evaluated several recent deep neural network (DNN)
architectures on a TIMIT phone recognition task. We chose the TIMIT corpus due to its …

Towards quantum language models

I Basile, F Tamburini - Proceedings of the 2017 Conference on …, 2017 - aclanthology.org
This paper presents a new approach for building Language Models using the Quantum
Probability Theory, a Quantum Language Model (QLM). It mainly shows that relying on this …

Community-supported shared infrastructure in support of speech accessibility

M Hasegawa-Johnson, X Zheng, H Kim… - Journal of Speech …, 2024 - pubs.asha.org
Purpose: The Speech Accessibility Project (SAP) intends to facilitate research and
development in automatic speech recognition (ASR) and other machine learning tasks for …

Cascaded tuning to amplitude modulation for natural sound recognition

T Koumura, H Terashima, S Furukawa - Journal of Neuroscience, 2019 - Soc Neuroscience
The auditory system converts the physical properties of a sound waveform to neural
activities and processes them for recognition. During the process, the tuning to amplitude …