Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
We propose a novel text-to-speech (TTS) framework centered around a neural transducer.
Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence …
Our approach divides the whole TTS pipeline into semantic-level sequence-to-sequence …
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Recent advancements in text-to-speech (TTS) powered by language models have
showcased remarkable capabilities in achieving naturalness and zero-shot voice cloning …
showcased remarkable capabilities in achieving naturalness and zero-shot voice cloning …