Intermediate loss regularization for ctc-based speech recognition
J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org
We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …
Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition
Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …
Funasr: A fundamental end-to-end speech recognition toolkit
This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …
bridge the gap between academic research and industrial applications. FunASR offers …
Deep learning based emotion recognition and visualization of figural representation
X Lu - Frontiers in psychology, 2022 - frontiersin.org
This exploration aims to study the emotion recognition of speech and graphic visualization of
expressions of learners under the intelligent learning environment of the Internet. After …
expressions of learners under the intelligent learning environment of the Internet. After …
Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition
Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …
transfer the knowledge learned on the abundance of high-resource language data for low …
Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition
Transformer-based models have led to significant innovation in various classic and practical
subjects, including speech processing, natural language processing, and computer vision …
subjects, including speech processing, natural language processing, and computer vision …
Example-based explanations with adversarial attacks for respiratory sound analysis
Respiratory sound classification is an important tool for remote screening of respiratory-
related diseases such as pneumonia, asthma, and COVID-19. To facilitate the interpretability …
related diseases such as pneumonia, asthma, and COVID-19. To facilitate the interpretability …
Streaming chunk-aware multihead attention for online end-to-end speech recognition
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more
and more attention. Many efforts have been paid to turn the non-streaming attention-based …
and more attention. Many efforts have been paid to turn the non-streaming attention-based …
A CIF-based speech segmentation method for streaming E2E ASR
Long utterances segmentation is crucial in end-to-end (E2E) streaming automatic speech
recognition (ASR). However, commonly used voice activity detection (VAD)-based and fixed …
recognition (ASR). However, commonly used voice activity detection (VAD)-based and fixed …
Extremely low footprint end-to-end ASR system for smart device
Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate
the acoustic, pronunciation and language models into a single neural network, which …
the acoustic, pronunciation and language models into a single neural network, which …