Torchaudio: Building blocks for audio and speech processing

V Pratap, A Tjandra, B Shi, P Tomasello, A Babu… - Journal of Machine …, 2024 - jmlr.org

Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …

被引用次数：205 相关文章所有 3 个版本

[HTML] nih.gov

A high-performance neuroprosthesis for speech decoding and avatar control

SL Metzger, KT Littlejohn, AB Silva, DA Moses… - Nature, 2023 - nature.com

Speech neuroprostheses have the potential to restore communication to people living with
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …

被引用次数：166 相关文章所有 9 个版本

[PDF] nature.com

Decoding speech perception from non-invasive brain recordings

A Défossez, C Caucheteux, J Rapin, O Kabeli… - Nature Machine …, 2023 - nature.com

Decoding speech from brain activity is a long-awaited goal in both healthcare and
neuroscience. Invasive devices have recently led to major milestones in this regard: deep …

被引用次数：107 相关文章所有 9 个版本

[PDF] arxiv.org

Investigating self-supervised learning for speech enhancement and separation

Z Huang, S Watanabe, S Yang, P García… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

Speech enhancement and separation are two fundamental tasks for robust speech
processing. Speech enhancement suppresses background noise while speech separation …

被引用次数：64 相关文章所有 6 个版本

[PDF] thecvf.com

Imitator: Personalized speech-driven 3d facial animation

B Thambiraja, I Habibie, S Aliakbarian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Speech-driven 3D facial animation has been widely explored, with applications in gaming,
character animation, virtual reality, and telepresence systems. State-of-the-art methods …

被引用次数：32 相关文章所有 7 个版本

[PDF] arxiv.org

Dphubert: Joint distillation and pruning of self-supervised speech models

Y Peng, Y Sudo, S Muhammad, S Watanabe - arXiv preprint arXiv …, 2023 - arxiv.org

Self-supervised learning (SSL) has achieved notable success in many speech processing
tasks, but the large model size and heavy computational cost hinder the deployment …

被引用次数：34 相关文章所有 6 个版本

[PDF] arxiv.org

Pruned RNN-T for fast, memory-efficient ASR training

F Kuang, L Guo, W Kang, L Lin, M Luo, Z Yao… - arXiv preprint arXiv …, 2022 - arxiv.org

The RNN-Transducer (RNN-T) framework for speech recognition has been growing in
popularity, particularly for deployed real-time ASR systems, because it combines high …

被引用次数：53 相关文章所有 10 个版本

[PDF] ieee.org

Music controlnet: Multiple time-varying controls for music generation

SL Wu, C Donahue, S Watanabe… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …

被引用次数：26 相关文章所有 3 个版本

[PDF] acm.org

Torchgeo: deep learning with geospatial data

AJ Stewart, C Robinson, IA Corley, A Ortiz… - Proceedings of the 30th …, 2022 - dl.acm.org

Remotely sensed geospatial data are critical for applications including precision agriculture,
urban planning, disaster monitoring and response, and climate change research, among …

被引用次数：61 相关文章所有 5 个版本

Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions

L Liu, P Zhou, G Sun, X Chen, T Wu, H Yu, M Guizani - Neurocomputing, 2023 - Elsevier

With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …

被引用次数：2 相关文章所有 3 个版本