Speed-Aware Audio-Driven Speech Animation using Adaptive Windows

S Jung, Y Seol, K Seo, H Na, S Kim, V Tan… - ACM Transactions on …, 2024 - dl.acm.org
We present a novel method that can generate realistic speech animations of a 3D face from
audio using multiple adaptive windows. In contrast to previous studies that use a fixed size …

Speaking style conversion in the waveform domain using discrete self-supervised units

G Maimon, Y Adi - arXiv preprint arXiv:2212.09730, 2022 - arxiv.org
We introduce DISSC, a novel, lightweight method that converts the rhythm, pitch contour and
timbre of a recording to a target speaker in a textless manner. Unlike DISSC, most voice …

Rhythm Modeling for Voice Conversion

B van Niekerk, MA Carbonneau… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
Voice conversion aims to transform source speech into a different target voice. However,
typical voice conversion systems do not account for rhythm, which is an important factor in …

On Feature Importance and Interpretability of Speaker Representations

F Rautenberg, M Kuhlmann… - … 15th ITG Conference, 2023 - ieeexplore.ieee.org
Unsupervised speech disentanglement aims at separating fast varying from slowly varying
components of a speech signal. In this contribution, we take a closer look at the embedding …