Scaling speech technology to 1,000+ languages
Expanding the language coverage of speech technology has the potential to improve
access to information for many more people. However, current speech technology is …
access to information for many more people. However, current speech technology is …
A high-performance neuroprosthesis for speech decoding and avatar control
Speech neuroprostheses have the potential to restore communication to people living with
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …
paralysis, but naturalistic speed and expressivity are elusive. Here we use high-density …
Decoding speech perception from non-invasive brain recordings
Decoding speech from brain activity is a long-awaited goal in both healthcare and
neuroscience. Invasive devices have recently led to major milestones in this regard: deep …
neuroscience. Invasive devices have recently led to major milestones in this regard: deep …
Investigating self-supervised learning for speech enhancement and separation
Speech enhancement and separation are two fundamental tasks for robust speech
processing. Speech enhancement suppresses background noise while speech separation …
processing. Speech enhancement suppresses background noise while speech separation …
Imitator: Personalized speech-driven 3d facial animation
Speech-driven 3D facial animation has been widely explored, with applications in gaming,
character animation, virtual reality, and telepresence systems. State-of-the-art methods …
character animation, virtual reality, and telepresence systems. State-of-the-art methods …
Dphubert: Joint distillation and pruning of self-supervised speech models
Self-supervised learning (SSL) has achieved notable success in many speech processing
tasks, but the large model size and heavy computational cost hinder the deployment …
tasks, but the large model size and heavy computational cost hinder the deployment …
Pruned RNN-T for fast, memory-efficient ASR training
The RNN-Transducer (RNN-T) framework for speech recognition has been growing in
popularity, particularly for deployed real-time ASR systems, because it combines high …
popularity, particularly for deployed real-time ASR systems, because it combines high …
Music controlnet: Multiple time-varying controls for music generation
Text-to-music generation models are now capable of generating high-quality music audio in
broad styles. However, text control is primarily suitable for the manipulation of global musical …
broad styles. However, text control is primarily suitable for the manipulation of global musical …
Torchgeo: deep learning with geospatial data
Remotely sensed geospatial data are critical for applications including precision agriculture,
urban planning, disaster monitoring and response, and climate change research, among …
urban planning, disaster monitoring and response, and climate change research, among …
Topologies in distributed machine learning: Comprehensive survey, recommendations and future directions
With the widespread use of distributed machine learning (DML), many IT companies have
established networks dedicated to DML. Different communication architectures of DML have …
established networks dedicated to DML. Different communication architectures of DML have …