Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward
Easy access to audio-visual content on social media, combined with the availability of
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
modern tools such as Tensorflow or Keras, and open-source trained models, along with …
Deep reinforcement learning: a survey
Deep reinforcement learning (RL) has become one of the most popular topics in artificial
intelligence research. It has been widely used in various fields, such as end-to-end control …
intelligence research. It has been widely used in various fields, such as end-to-end control …
A survey on neural speech synthesis
Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural
speech given text, is a hot research topic in speech, language, and machine learning …
speech given text, is a hot research topic in speech, language, and machine learning …
Libritts: A corpus derived from librispeech for text-to-speech
This paper introduces a new speech corpus called" LibriTTS" designed for text-to-speech
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …
use. It is derived from the original audio and text materials of the LibriSpeech corpus, which …
Transfer learning from speaker verification to multispeaker text-to-speech synthesis
We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to
generate speech audio in the voice of many different speakers, including those unseen …
generate speech audio in the voice of many different speakers, including those unseen …
Meta-stylespeech: Multi-speaker adaptive text-to-speech generation
With rapid progress in neural text-to-speech (TTS) models, personalized speech generation
is now in high demand for many applications. For practical applicability, a TTS model should …
is now in high demand for many applications. For practical applicability, a TTS model should …
Adaspeech: Adaptive text to speech for custom voice
Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims
to adapt a source TTS model to synthesize personal voice for a target speaker using few …
to adapt a source TTS model to synthesize personal voice for a target speaker using few …
Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings
While speaker adaptation for end-to-end speech synthesis using speaker embeddings can
produce good speaker similarity for speakers seen during training, there remains a gap for …
produce good speaker similarity for speakers seen during training, there remains a gap for …
Mega-tts: Zero-shot text-to-speech at scale with intrinsic inductive bias
Scaling text-to-speech to a large and wild dataset has been proven to be highly effective in
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …
achieving timbre and speech style generalization, particularly in zero-shot TTS. However …
Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly
translate speech from one language into speech in another language, without relying on an …
translate speech from one language into speech in another language, without relying on an …