Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition
Accents pose significant challenges for speech recognition systems. Although joint
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …
automatic speech recognition (ASR) and accent recognition (AR) training has been proven …
Leveraging phone-level linguistic-acoustic similarity for utterance-level pronunciation scoring
Recent studies on pronunciation scoring have explored the effect of introducing phone
embeddings as reference pronunciation, but mostly in an implicit manner, ie, addition or …
embeddings as reference pronunciation, but mostly in an implicit manner, ie, addition or …
MBCFNet: A Multimodal Brain–Computer Fusion Network for human intention recognition
Accurate recognition of human intent is crucial for effective human–computer speech
interaction. Numerous intent understanding studies were based on speech-to-text …
interaction. Numerous intent understanding studies were based on speech-to-text …
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Despite notable advancements in automatic speech recognition (ASR), performance tends
to degrade when faced with adverse conditions. Generative error correction (GER) …
to degrade when faced with adverse conditions. Generative error correction (GER) …
[PDF][PDF] Self-supervised Learning Representation based Accent Recognition with Persistent Accent Memory
Accent recognition (AR) is challenging due to the lack of training data as well as the accents
are entangled with speakers and regional characteristics. This paper aims to improve AR …
are entangled with speakers and regional characteristics. This paper aims to improve AR …