Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org
Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

Reconsidering read and spontaneous speech: Causal perspectives on the generation of training data for automatic speech recognition

P Gabler, BC Geiger, B Schuppler, R Kern - Information, 2023 - mdpi.com
Superficially, read and spontaneous speech—the two main kinds of training data for
automatic speech recognition—appear as complementary, but are equal: pairs of texts and …

Automatic pronunciation assessment using self-supervised speech representation learning

E Kim, JJ Jeon, H Seo, H Kim - arXiv preprint arXiv:2204.03863, 2022 - arxiv.org
Self-supervised learning (SSL) approaches such as wav2vec 2.0 and HuBERT models have
shown promising results in various downstream tasks in the speech community. In particular …

Understanding the role of self attention for efficient speech recognition

K Shim, J Choi, W Sung - International Conference on Learning …, 2022 - openreview.net
Self-attention (SA) is a critical component of Transformer neural networks that have
succeeded in automatic speech recognition (ASR). In this paper, we analyze the role of SA …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - Transactions of the …, 2024 - direct.mit.edu
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
improving performance and data efficiency on various speech tasks. However, these …

Deep versus wide: An analysis of student architectures for task-agnostic knowledge distillation of self-supervised speech models

T Ashihara, T Moriya, K Matsuura, T Tanaka - arXiv preprint arXiv …, 2022 - arxiv.org
Self-supervised learning (SSL) is seen as a very promising approach with high performance
for several speech downstream tasks. Since the parameters of SSL models are generally so …

What do self-supervised speech models know about words?

A Pasad, CM Chien, S Settle, K Livescu - arXiv preprint arXiv:2307.00162, 2023 - arxiv.org
Many self-supervised speech models (S3Ms) have been introduced over the last few years,
producing performance and data efficiency improvements for a variety of speech tasks …

Probing speech emotion recognition transformers for linguistic knowledge

A Triantafyllopoulos, J Wagner, H Wierstorf… - arXiv preprint arXiv …, 2022 - arxiv.org
Large, pre-trained neural networks consisting of self-attention layers (transformers) have
recently achieved state-of-the-art results on several speech emotion recognition (SER) …

Automated recognition of alzheimer's dementia using bag-of-deep-features and model ensembling

ZS Syed, MSS Syed, M Lech, E Pirogova - IEEE Access, 2021 - ieeexplore.ieee.org
Alzheimer's dementia is a progressive neurodegenerative disease that causes cognitive and
physical impairment. It severely deteriorates the quality of life in affected individuals. An …

Evidence of vocal tract articulation in self-supervised learning of speech

CJ Cho, P Wu, A Mohamed… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recent self-supervised learning (SSL) models have proven to learn rich representations of
speech, which can readily be utilized by diverse downstream tasks. To understand such …