[HTML][HTML] Brain-computer interface: applications to speech decoding and synthesis to augment communication
Damage or degeneration of motor pathways necessary for speech and other movements, as
in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient …
in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient …
ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit
T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …
[图书][B] Fundamentals of music processing: Audio, analysis, algorithms, applications
M Müller - 2015 - Springer
This textbook provides both profound technological knowledge and a comprehensive
treatment of essential topics in music processing and music information retrieval. Including …
treatment of essential topics in music processing and music information retrieval. Including …
Asteroid: the PyTorch-based audio source separation toolkit for researchers
This paper describes Asteroid, the PyTorch-based audio source separation toolkit for
researchers. Inspired by the most successful neural source separation systems, it provides …
researchers. Inspired by the most successful neural source separation systems, it provides …
[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve
Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …
extensively being harnessed across a diverse range of domains, eg, forensic science …
Fast spectrogram inversion using multi-head convolutional neural networks
We propose the multi-head convolutional neural network (MCNN) for waveform synthesis
from spectrograms. Nonlinear interpolation in MCNN is employed with transposed …
from spectrograms. Nonlinear interpolation in MCNN is employed with transposed …
CFAD: A Chinese dataset for fake audio detection
Fake audio detection is a growing concern and some relevant datasets have been designed
for research. However, there is no standard public Chinese dataset under complex …
for research. However, there is no standard public Chinese dataset under complex …
Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges
Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …
the advantages, the applications to disinformation are concerning as the counter-measures …
A context encoder for audio inpainting
In this article, we study the ability of deep neural networks (DNNs) to restore missing audio
content based on its context, ie, inpaint audio gaps. We focus on a condition which has not …
content based on its context, ie, inpaint audio gaps. We focus on a condition which has not …
{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems
Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …