Speech separation with large-scale self-supervised learning

[HTML][HTML] A comparison review of transfer learning and self-supervised learning: Definitions, applications, advantages and limitations

Z Zhao, L Alzubaidi, J Zhang, Y Duan, Y Gu - Expert Systems with …, 2023 - Elsevier

Deep learning has emerged as a powerful tool in various domains, revolutionising machine
learning research. However, one persistent challenge is the scarcity of labelled training …

被引用次数：39 相关文章所有 4 个版本

[PDF] arxiv.org

Audioldm 2: Learning holistic audio generation with self-supervised pretraining

H Liu, Y Yuan, X Liu, X Mei, Q Kong… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Although audio generation shares commonalities across different types of audio, such as
speech, music, and sound effects, designing models for each type requires careful …

被引用次数：54 相关文章所有 5 个版本

[PDF] arxiv.org

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

W Wu, X Chen, X Wu, H Li, H Meng - arXiv preprint arXiv:2403.16078, 2024 - arxiv.org

Audio-visual target speech extraction (AV-TSE) is one of the enabling technologies in
robotics and many audio-visual applications. One of the challenges of AV-TSE is how to …

Exploring speech representations for proficiency assessment in language learning

E Islam, C Park, T Hain - 9th Workshop on Speech and …, 2023 - eprints.whiterose.ac.uk

Automatic proficiency assessment can be a useful tool in language learning, for self-
evaluation of language skills and to enable educators to tailor instruction effectively. Often …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Listening to Multi-talker Conversations: Modular and End-to-end Perspectives

D Raj - arXiv preprint arXiv:2402.08932, 2024 - arxiv.org

Since the first speech recognition systems were built more than 30 years ago, improvement
in voice technology has enabled applications such as smart assistants and automated …

Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts

X Wang, Z Chen, Y Shi, J Wu, N Kanda… - arXiv preprint arXiv …, 2022 - arxiv.org

Employing a monaural speech separation (SS) model as a front-end for automatic speech
recognition (ASR) involves balancing two kinds of trade-offs. First, while a larger model …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition

D Raj, D Povey, S Khudanpur - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org

The Streaming Unmixing and Recognition Transducer (SURT) model was proposed recently
as an end-to-end approach for continuous, streaming, multi-talker speech recognition (ASR) …

被引用次数：3 相关文章所有 6 个版本

[PDF] unitn.it

Neural Enhancement Strategies for Robust Speech Processing

MNAM Nawar - 2023 - iris.unitn.it

In real-world scenarios, speech signals are often contaminated with environmental noises,
and reverberation, which degrades speech quality and intelligibility. Lately, the development …

[PDF][PDF] Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory

R Li, Z Xie, H Xu, Y Peng, H Liu, H Huang, ES Chng - isca-archive.org

Accent recognition (AR) is challenging due to the lack of training data as well as the accents
are entangled with speakers and regional characteristics. This paper aims to improve AR …

被引用次数：1 相关文章所有 2 个版本