Samuele Cornell, Loren Lugosch, Cem Subakan, Nauman Dawalatabad, Abdelwahab Heba, Jianyuan...

A Pasad, JC Chou, K Livescu - 2021 IEEE Automatic Speech …, 2021 - ieeexplore.ieee.org

Recently proposed self-supervised learning approaches have been successful for pre-
training speech representation models. The utility of these learned representations has been …

被引用次数：240 相关文章所有 5 个版本

[PDF] arxiv.org

Torchaudio: Building blocks for audio and speech processing

YY Yang, M Hira, Z Ni, A Astafurov… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

This document describes version 0.10 of TorchAudio: building blocks for machine learning
applications in the audio and speech processing domain. The objective of TorchAudio is to …

被引用次数：179 相关文章所有 7 个版本

[PDF] arxiv.org

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding

Y Wang, A Boumadane, A Heba - arXiv preprint arXiv:2111.02735, 2021 - arxiv.org

Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary
progress in Automatic Speech Recognition (ASR). However, they have not been totally …

被引用次数：144 相关文章所有 3 个版本

[PDF] arxiv.org

Wespeaker: A research and production oriented speaker embedding learning toolkit

H Wang, C Liang, S Wang, Z Chen… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org

Speaker modeling is essential for many related tasks, such as speaker recognition and
speaker diarization. The dominant modeling approach is fixed-dimensional vector …

被引用次数：65 相关文章所有 5 个版本

[PDF] arxiv.org

An efficient encoder-decoder architecture with top-down attention for speech separation

K Li, R Yang, X Hu - arXiv preprint arXiv:2209.15200, 2022 - arxiv.org

Deep neural networks have shown excellent prospects in speech separation tasks.
However, obtaining good results while keeping a low model complexity remains challenging …

被引用次数：37 相关文章所有 3 个版本

[PDF] jmlr.org

A first look into the carbon footprint of federated learning

X Qiu, T Parcollet, J Fernandez-Marques… - Journal of Machine …, 2023 - jmlr.org

Despite impressive results, deep learning-based technologies also raise severe privacy and
environmental concerns induced by the training procedure often conducted in data centers …

被引用次数：72 相关文章所有 6 个版本

[PDF] thecvf.com

Adverb: Visually guided audio dereverberation

S Chowdhury, S Ghosh, S Dasgupta… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present AdVerb, a novel audio-visual dereverberation framework that uses visual cues
in addition to the reverberant sound to estimate clean audio. Although audio-only …

被引用次数：6 相关文章所有 6 个版本

[PDF] thecvf.com

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

X Qi, J Pan, P Li, R Yuan, X Chi, M Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Generating vivid and emotional 3D co-speech gestures is crucial for virtual avatar animation
in human-machine interaction applications. While the existing methods enable generating …

被引用次数：3 相关文章所有 4 个版本

[PDF] usenix.org

{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems

X Wu, S Ma, C Shen, C Lin, Q Wang, Q Li… - 32nd USENIX Security …, 2023 - usenix.org

Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Paddlespeech: An easy-to-use all-in-one speech toolkit

H Zhang, T Yuan, J Chen, X Li, R Zheng… - arXiv preprint arXiv …, 2022 - arxiv.org

PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the
development and research of speech processing technologies by providing an easy-to-use …

被引用次数：22 相关文章所有 5 个版本