An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while keeping the linguistic …

Sixty years of frequency-domain monaural speech enhancement: From traditional to deep learning methods

C Zheng, H Zhang, W Liu, X Luo, A Li, X Li… - Trends in …, 2023 - journals.sagepub.com
Frequency-domain monaural speech enhancement has been extensively studied for over
60 years, and a great number of methods have been proposed and applied to many …

[HTML][HTML] Machine learning in acoustics: Theory and applications

MJ Bianco, P Gerstoft, J Traer, E Ozanich… - The Journal of the …, 2019 - pubs.aip.org
Acoustic data provide scientific and engineering insights in fields ranging from biology and
communications to ocean and Earth science. We survey the recent advances and …

Raw waveform-based speech enhancement by fully convolutional networks

SW Fu, Y Tsao, X Lu, H Kawai - 2017 Asia-Pacific Signal and …, 2017 - ieeexplore.ieee.org
This study proposes a fully convolutional network (FCN) model for raw waveform-based
speech enhancement. The proposed system performs speech enhancement in an end-to …

Interactive speech and noise modeling for speech enhancement

C Zheng, X Peng, Y Zhang, S Srinivasan… - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Speech enhancement is challenging because of the diversity of background noise types.
Most of the existing methods are focused on modelling the speech rather than the noise. In …

Hawkes processes for events in social media

MA Rizoiu, Y Lee, S Mishra, L Xie - Frontiers of multimedia research, 2017 - dl.acm.org
This chapter provides an accessible introduction for point processes, and especially Hawkes
processes, for modeling discrete, inter-dependent events over continuous time. We start by …

Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition

X Feng, Y Zhang, J Glass - 2014 IEEE international conference …, 2014 - ieeexplore.ieee.org
Denoising autoencoders (DAs) have shown success in generating robust features for
images, but there has been limited work in applying DAs for speech. In this paper we …

[PDF][PDF] SNR-Aware Convolutional Neural Network Modeling for Speech Enhancement.

SW Fu, Y Tsao, X Lu - Interspeech, 2016 - academia.edu
This paper proposes a signal-to-noise-ratio (SNR) aware convolutional neural network
(CNN) model for speech enhancement (SE). Because the CNN model can deal with local …

Deep learning for video classification and captioning

Z Wu, T Yao, Y Fu, YG Jiang - Frontiers of multimedia research, 2017 - dl.acm.org
Deep learning for video classification and captioning Page 1 IPART MULTIMEDIA
CONTENT ANALYSIS Page 2 Page 3 1Deep Learning for Video Classification and …

Speech enhancement based on teacher–student deep learning using improved speech presence probability for noise-robust speech recognition

YH Tu, J Du, CH Lee - IEEE/ACM Transactions on Audio …, 2019 - ieeexplore.ieee.org
In this paper, we propose a novel teacher-student learning framework for the preprocessing
of a speech recognizer, leveraging the online noise tracking capabilities of improved minima …