Real-time intermediate flow estimation for video frame interpolation
Real-time video frame interpolation (VFI) is very useful in video processing, media players,
and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm …
and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm …
Deep person generation: A survey from the perspective of face, pose, and cloth synthesis
Deep person generation has attracted extensive research attention due to its wide
applications in virtual agents, video conferencing, online shopping, and art/movie …
applications in virtual agents, video conferencing, online shopping, and art/movie …
Can language models learn to listen?
E Ng, S Subramanian, D Klein… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a framework for generating appropriate facial responses from a listener in
dyadic social interactions based on the speaker's words. Given an input transcription of the …
dyadic social interactions based on the speaker's words. Given an input transcription of the …
From audio to photoreal embodiment: Synthesizing humans in conversations
We present a framework for generating full-bodied photorealistic avatars that gesture
according to the conversational dynamics of a dyadic interaction. Given speech audio we …
according to the conversational dynamics of a dyadic interaction. Given speech audio we …
Emotional listener portrait: Neural listener head generation with emotion
Listener head generation centers on generating non-verbal behaviors (eg, smile) of a
listener in reference to the information delivered by a speaker. A significant challenge when …
listener in reference to the information delivered by a speaker. A significant challenge when …
Reactface: Multiple appropriate facial reaction generation in dyadic interactions
In dyadic interaction, predicting the listener's facial reactions is challenging as different
reactions may be appropriate in response to the same speaker's behaviour. This paper …
reactions may be appropriate in response to the same speaker's behaviour. This paper …
Reversible graph neural network-based reaction distribution learning for multiple appropriate facial reactions generation
Generating facial reactions in a human-human dyadic interaction is complex and highly
dependent on the context since more than one facial reactions can be appropriate for the …
dependent on the context since more than one facial reactions can be appropriate for the …
Mfr-net: Multi-faceted responsive listening head generation via denoising diffusion model
Face-to-face communication is a common scenario including roles of speakers and
listeners. Most existing research methods focus on producing speaker videos, while the …
listeners. Most existing research methods focus on producing speaker videos, while the …
Audio-driven talking head generation with transformer and 3d morphable model
In the task of talking head generation, it is hard to learn the mapping relationship between
generated head image and input audio signal. To tackle this challenge, we propose to learn …
generated head image and input audio signal. To tackle this challenge, we propose to learn …
Emotional listener portrait: Realistic listener motion simulation in conversation
Listener head generation centers on generating non-verbal behaviors (eg, smile) of a
listener in reference to the information delivered by a speaker. A significant challenge when …
listener in reference to the information delivered by a speaker. A significant challenge when …