[HTML][HTML] A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations

Z Zhao, L Alzubaidi, J Zhang, Y Duan, Y Gu - Expert Systems with …, 2023 - Elsevier
Deep learning has emerged as a powerful tool in various domains, revolutionising machine
learning research. However, one persistent challenge is the scarcity of labelled training …

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

W Wu, X Chen, X Wu, H Li, H Meng - arXiv preprint arXiv:2403.16078, 2024 - arxiv.org
Audio-visual target speech extraction (AV-TSE) is one of the enabling technologies in
robotics and many audio-visual applications. One of the challenges of AV-TSE is how to …

Exploring speech representations for proficiency assessment in language learning

E Islam, C Park, T Hain - 9th Workshop on Speech and …, 2023 - eprints.whiterose.ac.uk
Automatic proficiency assessment can be a useful tool in language learning, for self-
evaluation of language skills and to enable educators to tailor instruction effectively. Often …

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives

D Raj - arXiv preprint arXiv:2402.08932, 2024 - arxiv.org
Since the first speech recognition systems were built more than 30 years ago, improvement
in voice technology has enabled applications such as smart assistants and automated …

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

X Wang, Z Chen, Y Shi, J Wu, N Kanda… - arXiv preprint arXiv …, 2022 - arxiv.org
Employing a monaural speech separation (SS) model as a front-end for automatic speech
recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model …

SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition

D Raj, D Povey, S Khudanpur - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently
as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR) …

Neural Enhancement Strategies for Robust Speech Processing

MNAM Nawar - 2023 - iris.unitn.it
In real-world scenarios, speech signals are often contaminated with environmental noises,
and reverberation, which degrades speech quality and intelligibility. Lately, the development …

[PDF][PDF] Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory

R Li, Z Xie, H Xu, Y Peng, H Liu, H Huang, ES Chng - isca-archive.org
Accent recognition (AR) is challenging due to the lack of training data as well as the accents
are entangled with speakers and regional characteristics. This paper aims to improve AR …