- 学术资源搜索

[PDF][PDF] Recent advances in end-to-end automatic speech recognition

J Li - APSIPA Transactions on Signal and Information …, 2022 - nowpublishers.com

Recently, the speech community is seeing a significant trend of moving from deep neural
network based hybrid modeling to end-to-end (E2E) modeling for automatic speech …

被引用次数：384 相关文章所有 7 个版本

[PDF] mlr.press

Robust speech recognition via large-scale weak supervision

A Radford, JW Kim, T Xu, G Brockman… - International …, 2023 - proceedings.mlr.press

We study the capabilities of speech processing systems trained simply to predict large
amounts of transcripts of audio on the internet. When scaled to 680,000 hours of multilingual …

被引用次数：2917 相关文章所有 11 个版本

[PDF] jmlr.org

Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

被引用次数：216 相关文章所有 3 个版本

[PDF] arxiv.org

XLS-R: Self-supervised cross-lingual speech representation learning at scale

A Babu, C Wang, A Tjandra, K Lakhotia, Q Xu… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper presents XLS-R, a large-scale model for cross-lingual speech representation
learning based on wav2vec 2.0. We train models with up to 2B parameters on nearly half a …

被引用次数：616 相关文章所有 5 个版本

[PDF] cambridge.org

Emerging trends: A gentle introduction to fine-tuning

KW Church, Z Chen, Y Ma - Natural Language Engineering, 2021 - cambridge.org

The previous Emerging Trends article (Church et al., 2021. Natural Language
Engineering27 (5), 631–645.) introduced deep nets to poets. Poets is an imperfect …

被引用次数：65 相关文章所有 4 个版本

[PDF] mlr.press

Branchformer: Parallel mlp-attention architectures to capture local and global context for speech recognition and understanding

Y Peng, S Dalmia, I Lane… - … Conference on Machine …, 2022 - proceedings.mlr.press

Conformer has proven to be effective in many speech processing tasks. It combines the
benefits of extracting local dependencies using convolutions and global dependencies …

被引用次数：138 相关文章所有 8 个版本

[PDF] arxiv.org

Layer-wise analysis of a self-supervised speech representation model

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

被引用次数：282 相关文章所有 5 个版本

[PDF] arxiv.org

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

被引用次数：196 相关文章所有 7 个版本

[PDF] thecvf.com

Transformer-based multimodal information fusion for facial expression analysis

W Zhang, F Qiu, S Wang, H Zeng… - Proceedings of the …, 2022 - openaccess.thecvf.com

Human affective behavior analysis has received much attention in human-computer
interaction (HCI). In this paper, we introduce our submission to the CVPR 2022 Competition …

被引用次数：102 相关文章所有 7 个版本

[PDF] arxiv.org

Speech emotion recognition using self-supervised features

E Morais, R Hoory, W Zhu, I Gat… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Self-supervised pre-trained features have consistently delivered state-of-art results in the
field of natural language processing (NLP); however, their merits in the field of speech …

被引用次数：130 相关文章所有 6 个版本