Intermediate loss regularization for ctc-based speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

Funasr: A fundamental end-to-end speech recognition toolkit

Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …

Deep learning based emotion recognition and visualization of figural representation

X Lu - Frontiers in psychology, 2022 - frontiersin.org
This exploration aims to study the emotion recognition of speech and graphic visualization of
expressions of learners under the intelligent learning environment of the Internet. After …

Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

G Zheng, Y Xiao, K Gong, P Zhou, X Liang… - arXiv preprint arXiv …, 2021 - arxiv.org
Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …

Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition

FH Yu, KY Chen, KH Lu - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Transformer-based models have led to significant innovation in various classic and practical
subjects, including speech processing, natural language processing, and computer vision …

Example-based explanations with adversarial attacks for respiratory sound analysis

Y Chang, Z Ren, TT Nguyen, W Nejdl… - arXiv preprint arXiv …, 2022 - arxiv.org
Respiratory sound classification is an important tool for remote screening of respiratory-
related diseases such as pneumonia, asthma, and COVID-19. To facilitate the interpretability …

Streaming chunk-aware multihead attention for online end-to-end speech recognition

S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan… - arXiv preprint arXiv …, 2020 - arxiv.org
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more
and more attention. Many efforts have been paid to turn the non-streaming attention-based …

A CIF-based speech segmentation method for streaming E2E ASR

Y Shu, H Luo, S Zhang, L Wang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Long utterances segmentation is crucial in end-to-end (E2E) streaming automatic speech
recognition (ASR). However, commonly used voice activity detection (VAD)-based and fixed …

Extremely low footprint end-to-end ASR system for smart device

Z Gao, Y Yao, S Zhang, J Yang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate
the acoustic, pronunciation and language models into a single neural network, which …