[HTML][HTML] Brain-computer interface: applications to speech decoding and synthesis to augment communication

S Luo, Q Rabbani, NE Crone - Neurotherapeutics, 2022 - Elsevier
Damage or degeneration of motor pathways necessary for speech and other movements, as
in brainstem strokes or amyotrophic lateral sclerosis (ALS), can interfere with efficient …

ESPnet-TTS: Unified, reproducible, and integratable open source end-to-end text-to-speech toolkit

T Hayashi, R Yamamoto, K Inoue… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-
TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit …

[图书][B] Fundamentals of music processing: Audio, analysis, algorithms, applications

M Müller - 2015 - Springer
This textbook provides both profound technological knowledge and a comprehensive
treatment of essential topics in music processing and music information retrieval. Including …

Asteroid: the PyTorch-based audio source separation toolkit for researchers

M Pariente, S Cornell, J Cosentino… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper describes Asteroid, the PyTorch-based audio source separation toolkit for
researchers. Inspired by the most successful neural source separation systems, it provides …

[HTML][HTML] Video and audio deepfake datasets and open issues in deepfake technology: being ahead of the curve

Z Akhtar, TL Pendyala, VS Athmakuri - Forensic Sciences, 2024 - mdpi.com
The revolutionary breakthroughs in Machine Learning (ML) and Artificial Intelligence (AI) are
extensively being harnessed across a diverse range of domains, eg, forensic science …

Fast spectrogram inversion using multi-head convolutional neural networks

SÖ Arık, H Jun, G Diamos - IEEE Signal Processing Letters, 2018 - ieeexplore.ieee.org
We propose the multi-head convolutional neural network (MCNN) for waveform synthesis
from spectrograms. Nonlinear interpolation in MCNN is employed with transposed …

CFAD: A Chinese dataset for fake audio detection

H Ma, J Yi, C Wang, X Yan, J Tao, T Wang… - Speech …, 2024 - Elsevier
Fake audio detection is a growing concern and some relevant datasets have been designed
for research. However, there is no standard public Chinese dataset under complex …

Generation and detection of manipulated multimodal audiovisual content: Advances, trends and open challenges

H Liz-Lopez, M Keita, A Taleb-Ahmed, A Hadid… - Information …, 2024 - Elsevier
Generative deep learning techniques have invaded the public discourse recently. Despite
the advantages, the applications to disinformation are concerning as the counter-measures …

A context encoder for audio inpainting

A Marafioti, N Perraudin, N Holighaus… - … /ACM Transactions on …, 2019 - ieeexplore.ieee.org
In this article, we study the ability of deep neural networks (DNNs) to restore missing audio
content based on its context, ie, inpaint audio gaps. We focus on a condition which has not …

{KENKU}: Towards Efficient and Stealthy Black-box Adversarial Attacks against {ASR} Systems

X Wu, S Ma, C Shen, C Lin, Q Wang, Q Li… - 32nd USENIX Security …, 2023 - usenix.org
Prior researchers show that existing automatic speech recognition (ASR) systems are
vulnerable to adversarial examples. Most existing adversarial attacks against ASR systems …