Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

Q Shao, P Guo, J Yan, P Hu… - IEEE/ACM Transactions on …, 2023 - ieeexplore.ieee.org
Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …

Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring

W Liu, K Fu, X Tian, S Shi, W Li, Z Ma… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Recent studies on pronunciation scoring have explored the effect of introducing phone
embeddings as reference pronunciation, but mostly in an implicit manner, ie, addition or …

MBCFNet: A Multimodal Brain–Computer Fusion Network for human intention recognition

Z Li, G Zhang, S Okada, L Wang, B Zhao… - Knowledge-Based …, 2024 - Elsevier
Accurate recognition of human intent is crucial for effective human–computer speech
interaction. Numerous intent understanding studies were based on speech-to-text …

MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition

B Mu, Y Li, Q Shao, K Wei, X Wan, N Zheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite notable advancements in automatic speech recognition (ASR), performance tends
to degrade when faced with adverse conditions. Generative error correction (GER) …

[PDF][PDF] Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory

R Li, Z Xie, H Xu, Y Peng, H Liu, H Huang, ES Chng - isca-archive.org
Accent recognition (AR) is challenging due to the lack of training data as well as the accents
are entangled with speakers and regional characteristics. This paper aims to improve AR …