Real-time intermediate flow estimation for video frame interpolation

Z Huang, T Zhang, W Heng, B Shi, S Zhou - European Conference on …, 2022 - Springer
Real-time video frame interpolation (VFI) is very useful in video processing, media players,
and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm …

Deep person generation: A survey from the perspective of face, pose, and cloth synthesis

T Sha, W Zhang, T Shen, Z Li, T Mei - ACM Computing Surveys, 2023 - dl.acm.org
Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shopping, and art/movie …

Can language models learn to listen?

E Ng, S Subramanian, D Klein… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a framework for generating appropriate facial responses from a listener in
dyadic social interactions based on the speaker's words. Given an input transcription of the …

From audio to photoreal embodiment: Synthesizing humans in conversations

E Ng, J Romero, T Bagautdinov, S Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a framework for generating full-bodied photorealistic avatars that gesture
according to the conversational dynamics of a dyadic interaction. Given speech audio we …

Emotional listener portrait: Neural listener head generation with emotion

L Song, G Yin, Z Jin, X Dong… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Listener head generation centers on generating non-verbal behaviors (eg, smile) of a
listener in reference to the information delivered by a speaker. A significant challenge when …

Reactface: Multiple appropriate facial reaction generation in dyadic interactions

C Luo, S Song, W Xie, M Spitale, L Shen… - arXiv preprint arXiv …, 2023 - arxiv.org
In dyadic interaction, predicting the listener's facial reactions is challenging as different
reactions may be appropriate in response to the same speaker's behaviour. This paper …

Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation

T Xu, M Spitale, H Tang, L Liu, H Gunes… - arXiv preprint arXiv …, 2023 - arxiv.org
Generating facial reactions in a human-human dyadic interaction is complex and highly
dependent on the context since more than one facial reactions can be appropriate for the …

Mfr-net: Multi-faceted responsive listening head generation via denoising diffusion model

J Liu, X Wang, X Fu, Y Chai, C Yu, J Dai… - Proceedings of the 31st …, 2023 - dl.acm.org
Face-to-face communication is a common scenario including roles of speakers and
listeners. Most existing research methods focus on producing speaker videos, while the …

Audio-driven talking head generation with transformer and 3d morphable model

R Huang, W Zhong, G Li - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
In the task of talking head generation, it is hard to learn the mapping relationship between
generated head image and input audio signal. To tackle this challenge, we propose to learn …

Emotional listener portrait: Realistic listener motion simulation in conversation

L Song, G Yin, Z Jin, X Dong… - 2023 IEEE/CVF …, 2023 - ieeexplore.ieee.org
Listener head generation centers on generating non-verbal behaviors (eg, smile) of a
listener in reference to the information delivered by a speaker. A significant challenge when …