Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, MS Ribeiro, B Tura-Vecino… - arXiv preprint arXiv …, 2023 - arxiv.org
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …

[HTML][HTML] Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, S Ribeiro, BT Vecino… - 2023 - amazon.science
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …

[PDF][PDF] Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, MS Ribeiro, B Tura-Vecino… - researchgate.net
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …

Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, MS Ribeiro, B Tura-Vecino… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …

[PDF][PDF] Comparing normalizing flows and diffusion models for prosody and acoustic modelling in text-to-speech

G Zhang, T Merritt, MS Ribeiro, B Tura-Vecino… - isca-archive.org
Neural text-to-speech systems are often optimized on L1/L2 losses, which make strong
assumptions about the distributions of the target data space. Aiming to improve those …