A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation
Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …
human communication. The automatic generation of such co‐speech gestures is a long …
A review of eye gaze in virtual agents, social robotics and hci: Behaviour generation, user interaction and perception
A person's emotions and state of mind are apparent in their face and eyes. As a Latin
proverb states:'The face is the portrait of the mind; the eyes, its informers'. This presents a …
proverb states:'The face is the portrait of the mind; the eyes, its informers'. This presents a …
Taming diffusion models for audio-driven co-speech gesture generation
Animating virtual avatars to make co-speech gestures facilitates various applications in
human-machine interaction. The existing methods mainly rely on generative adversarial …
human-machine interaction. The existing methods mainly rely on generative adversarial …
Generating holistic 3d human motion from speech
This work addresses the problem of generating 3D holistic body motions from human
speech. Given a speech recording, we synthesize sequences of 3D body poses, hand …
speech. Given a speech recording, we synthesize sequences of 3D body poses, hand …
Speech gesture generation from the trimodal context of text, audio, and speaker identity
For human-like agents, including virtual avatars and social robots, making proper gestures
while speaking is crucial in human-agent interaction. Co-speech gestures enhance …
while speaking is crucial in human-agent interaction. Co-speech gestures enhance …
Learning hierarchical cross-modal association for co-speech gesture generation
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …
Learning individual styles of conversational gesture
Human speech is often accompanied by hand and arm gestures. We present a method for
cross-modal translation from" in-the-wild" monologue speech of a single speaker to their …
cross-modal translation from" in-the-wild" monologue speech of a single speaker to their …
Livelyspeaker: Towards semantic-aware co-speech gesture generation
Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …
Audio-driven facial animation by joint end-to-end learning of pose and emotion
We present a machine learning technique for driving 3D facial animation by audio input in
real time and with low latency. Our deep neural network learns a mapping from input …
real time and with low latency. Our deep neural network learns a mapping from input …
ZeroEGGS: Zero‐shot Example‐based Gesture Generation from Speech
We present ZeroEGGS, a neural network framework for speech‐driven gesture generation
with zero‐shot style control by example. This means style can be controlled via only a short …
with zero‐shot style control by example. This means style can be controlled via only a short …