Speed-Aware Audio-Driven Speech Animation using Adaptive Windows
We present a novel method that can generate realistic speech animations of a 3D face from
audio using multiple adaptive windows. In contrast to previous studies that use a fixed size …
audio using multiple adaptive windows. In contrast to previous studies that use a fixed size …
Speaking style conversion in the waveform domain using discrete self-supervised units
We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …
Rhythm Modeling for Voice Conversion
B van Niekerk, MA Carbonneau… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Voice conversion aims to transform source speech into a different target voice. However,
typical voice conversion systems do not account for rhythm, which is an important factor in …
typical voice conversion systems do not account for rhythm, which is an important factor in …
On Feature Importance and Interpretability of Speaker Representations
F Rautenberg, M Kuhlmann… - … 15th ITG Conference, 2023 - ieeexplore.ieee.org
Unsupervised speech disentanglement aims at separating fast varying from slowly varying
components of a speech signal. In this contribution, we take a closer look at the embedding …
components of a speech signal. In this contribution, we take a closer look at the embedding …