Hierarchical cross-modal talking face generation with dynamic pixel-wise loss

L Chen, RK Maddox, Z Duan… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
We devise a cascade GAN approach to generate talking face video, which is robust to
different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead
of learning a direct mapping from audio to video frames, we propose first to transfer audio to
high-level structure, ie, the facial landmarks, and then to generate video frames conditioned
on the landmarks. Compared to a direct audio-to-image approach, our cascade approach
avoids fitting spurious correlations between audiovisual signals that are irrelevant to the …

Hierarchical cross-modal talking face generationwith dynamic pixel-wise loss

L Chen, RK Maddox, Z Duan, C Xu - arXiv preprint arXiv:1905.03820, 2019 - arxiv.org
We devise a cascade GAN approach to generate talking face video, which is robust to
different face shapes, view angles, facial characteristics, and noisy audio conditions. Instead
of learning a direct mapping from audio to video frames, we propose first to transfer audio to
high-level structure, ie, the facial landmarks, and then to generate video frames conditioned
on the landmarks. Compared to a direct audio-to-image approach, our cascade approach
avoids fitting spurious correlations between audiovisual signals that are irrelevant to the …
以上显示的是最相近的搜索结果。 查看全部搜索结果