WaveNeXt: ConvNeXt-based fast neural vocoder without iSTFT layer

T Okamoto, H Yamashita, Y Ohtani… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
A recently proposed neural vocoder, Vocos, can perform inference ten times faster than HiFi-
GAN because of its use of ConvNeXt layers that can predict high-resolution short-time …

Fast Neural Speech Waveform Generative Models With Fully-Connected Layer-Based Upsampling

H Yamashita, T Okamoto, R Takashima, Y Ohtani… - IEEE …, 2024 - ieeexplore.ieee.org
Although end-to-end (E2E) text-to-speech (TTS) models with HiFi-GAN-based neural
vocoder (eg VITS and JETS) can achieve human-like speech quality with fast inference …