Scaling speech technology to 1,000+ languages

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

A high-performance neuroprosthesis for speech decoding and avatar control

SL Metzger, KT Littlejohn, AB Silva, DA Moses… - Nature, 2023 - nature.com
Speech neuroprostheses have the potential to restore communication to people living with
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …

Decoding speech perception from non-invasive brain recordings

A Défossez, C Caucheteux, J Rapin, O Kabeli… - Nature Machine …, 2023 - nature.com
Decoding speech from brain activity is a long-awaited goal in both healthcare and
neuroscience. Invasive devices have recently led to major milestones in this regard: deep …

Investigating self-supervised learning for speech enhancement and separation

Z Huang, S Watanabe, S Yang, P García… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speech enhancement and separation are two fundamental tasks for robust speech
processing. Speech enhancement suppresses background noise while speech separation …

Imitator: Personalized speech-driven 3d facial animation

B Thambiraja, I Habibie, S Aliakbarian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Speech-driven 3D facial animation has been widely explored, with applications in gaming,
character animation, virtual reality, and telepresence systems. State-of-the-art methods …

Dphubert: Joint distillation and pruning of self-supervised speech models

Y Peng, Y Sudo, S Muhammad, S Watanabe - arXiv preprint arXiv …, 2023 - arxiv.org
Self-supervised learning (SSL) has achieved notable success in many speech processing
tasks, but the large model size and heavy computational cost hinder the deployment …

Pruned RNN-T for fast, memory-efficient ASR training

F Kuang, L Guo, W Kang, L Lin, M Luo, Z Yao… - arXiv preprint arXiv …, 2022 - arxiv.org
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in
popularity, particularly for deployed real-time ASR systems, because it combines high …

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

Torchgeo: deep learning with geospatial data

AJ Stewart, C Robinson, IA Corley, A Ortiz… - Proceedings of the 30th …, 2022 - dl.acm.org
Remotely sensed geospatial data are critical for applications including precision agriculture,
urban planning, disaster monitoring and response, and climate change research, among …

Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

L Liu, P Zhou, G Sun, X Chen, T Wu, H Yu, M Guizani - Neurocomputing, 2023 - Elsevier
With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …