A taxonomy of social cues for conversational agents
Conversational agents (CAs) are software-based systems designed to interact with humans
using natural language and have attracted considerable research interest in recent years …
using natural language and have attracted considerable research interest in recent years …
A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation
Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …
human communication. The automatic generation of such co‐speech gestures is a long …
Taming diffusion models for audio-driven co-speech gesture generation
Animating virtual avatars to make co-speech gestures facilitates various applications in
human-machine interaction. The existing methods mainly rely on generative adversarial …
human-machine interaction. The existing methods mainly rely on generative adversarial …
Gesturediffuclip: Gesture diffusion model with clip latents
T Ao, Z Zhang, L Liu - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org
The automatic generation of stylized co-speech gestures has recently received increasing
attention. Previous systems typically allow style control via predefined text labels or example …
attention. Previous systems typically allow style control via predefined text labels or example …
Learning hierarchical cross-modal association for co-speech gesture generation
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …
Learning individual styles of conversational gesture
Human speech is often accompanied by hand and arm gestures. We present a method for
cross-modal translation from" in-the-wild" monologue speech of a single speaker to their …
cross-modal translation from" in-the-wild" monologue speech of a single speaker to their …
Learning to listen: Modeling non-deterministic dyadic facial motion
We present a framework for modeling interactional communication in dyadic conversations:
given multimodal inputs of a speaker, we autoregressively output multiple possibilities of …
given multimodal inputs of a speaker, we autoregressively output multiple possibilities of …
Can language models learn to listen?
E Ng, S Subramanian, D Klein… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a framework for generating appropriate facial responses from a listener in
dyadic social interactions based on the speaker's words. Given an input transcription of the …
dyadic social interactions based on the speaker's words. Given an input transcription of the …
Livelyspeaker: Towards semantic-aware co-speech gesture generation
Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …
From audio to photoreal embodiment: Synthesizing humans in conversations
We present a framework for generating full-bodied photorealistic avatars that gesture
according to the conversational dynamics of a dyadic interaction. Given speech audio we …
according to the conversational dynamics of a dyadic interaction. Given speech audio we …