San-m: Memory equipped self-attention for end-to-end speech recognition

J Lee, S Watanabe - ICASSP 2021-2021 IEEE International …, 2021 - ieeexplore.ieee.org

We present a simple and efficient auxiliary loss function for automatic speech recognition
(ASR) based on the connectionist temporal classification (CTC) objective. The proposed …

被引用次数：130 相关文章所有 5 个版本

[PDF] arxiv.org

Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition

Z Gao, S Zhang, I McLoughlin, Z Yan - arXiv preprint arXiv:2206.08317, 2022 - arxiv.org

Transformers have recently dominated the ASR field. Although able to yield good
performance, they involve an autoregressive (AR) decoder to generate tokens one by one …

被引用次数：52 相关文章所有 8 个版本

[PDF] arxiv.org

Funasr: A fundamental end-to-end speech recognition toolkit

Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper introduces FunASR, an open-source speech recognition toolkit designed to
bridge the gap between academic research and industrial applications. FunASR offers …

被引用次数：23 相关文章所有 5 个版本

[PDF] frontiersin.org

Deep learning based emotion recognition and visualization of figural representation

X Lu - Frontiers in psychology, 2022 - frontiersin.org

This exploration aims to study the emotion recognition of speech and graphic visualization of
expressions of learners under the intelligent learning environment of the Internet. After …

被引用次数：35 相关文章所有 5 个版本

[PDF] arxiv.org

Wav-BERT: Cooperative acoustic and linguistic representation learning for low-resource speech recognition

G Zheng, Y Xiao, K Gong, P Zhou, X Liang… - arXiv preprint arXiv …, 2021 - arxiv.org

Unifying acoustic and linguistic representation learning has become increasingly crucial to
transfer the knowledge learned on the abundance of high-resource language data for low …

被引用次数：29 相关文章所有 4 个版本

Non-autoregressive asr modeling using pre-trained language models for chinese speech recognition

FH Yu, KY Chen, KH Lu - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org

Transformer-based models have led to significant innovation in various classic and practical
subjects, including speech processing, natural language processing, and computer vision …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Example-based explanations with adversarial attacks for respiratory sound analysis

Y Chang, Z Ren, TT Nguyen, W Nejdl… - arXiv preprint arXiv …, 2022 - arxiv.org

Respiratory sound classification is an important tool for remote screening of respiratory-
related diseases such as pneumonia, asthma, and COVID-19. To facilitate the interpretability …

被引用次数：16 相关文章所有 10 个版本

[PDF] arxiv.org

Streaming chunk-aware multihead attention for online end-to-end speech recognition

S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan… - arXiv preprint arXiv …, 2020 - arxiv.org

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more
and more attention. Many efforts have been paid to turn the non-streaming attention-based …

被引用次数：30 相关文章所有 8 个版本

A CIF-based speech segmentation method for streaming E2E ASR

Y Shu, H Luo, S Zhang, L Wang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org

Long utterances segmentation is crucial in end-to-end (E2E) streaming automatic speech
recognition (ASR). However, commonly used voice activity detection (VAD)-based and fixed …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Extremely low footprint end-to-end ASR system for smart device

Z Gao, Y Yao, S Zhang, J Yang, M Lei… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, end-to-end (E2E) speech recognition has become popular, since it can integrate
the acoustic, pronunciation and language models into a single neural network, which …

被引用次数：15 相关文章所有 7 个版本