A taxonomy of social cues for conversational agents

J Feine, U Gnewuch, S Morana, A Maedche - International Journal of …, 2019 - Elsevier
Conversational agents (CAs) are software-based systems designed to interact with humans
using natural language and have attracted considerable research interest in recent years …

A Comprehensive Review of Data‐Driven Co‐Speech Gesture Generation

S Nyatsanga, T Kucherenko, C Ahuja… - Computer Graphics …, 2023 - Wiley Online Library
Gestures that accompany speech are an essential part of natural and efficient embodied
human communication. The automatic generation of such co‐speech gestures is a long …

Taming diffusion models for audio-driven co-speech gesture generation

L Zhu, X Liu, X Liu, R Qian, Z Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Animating virtual avatars to make co-speech gestures facilitates various applications in
human-machine interaction. The existing methods mainly rely on generative adversarial …

Gesturediffuclip: Gesture diffusion model with clip latents

T Ao, Z Zhang, L Liu - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org
The automatic generation of stylized co-speech gestures has recently received increasing
attention. Previous systems typically allow style control via predefined text labels or example …

Learning hierarchical cross-modal association for co-speech gesture generation

X Liu, Q Wu, H Zhou, Y Xu, R Qian… - Proceedings of the …, 2022 - openaccess.thecvf.com
Generating speech-consistent body and gesture movements is a long-standing problem in
virtual avatar creation. Previous studies often synthesize pose movement in a holistic …

Learning individual styles of conversational gesture

S Ginosar, A Bar, G Kohavi, C Chan… - Proceedings of the …, 2019 - openaccess.thecvf.com
Human speech is often accompanied by hand and arm gestures. We present a method for
cross-modal translation from" in-the-wild" monologue speech of a single speaker to their …

Learning to listen: Modeling non-deterministic dyadic facial motion

E Ng, H Joo, L Hu, H Li, T Darrell… - Proceedings of the …, 2022 - openaccess.thecvf.com
We present a framework for modeling interactional communication in dyadic conversations:
given multimodal inputs of a speaker, we autoregressively output multiple possibilities of …

Can language models learn to listen?

E Ng, S Subramanian, D Klein… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a framework for generating appropriate facial responses from a listener in
dyadic social interactions based on the speaker's words. Given an input transcription of the …

Livelyspeaker: Towards semantic-aware co-speech gesture generation

Y Zhi, X Cun, X Chen, X Shen, W Guo… - Proceedings of the …, 2023 - openaccess.thecvf.com
Gestures are non-verbal but important behaviors accompanying people's speech. While
previous methods are able to generate speech rhythm-synchronized gestures, the semantic …

From audio to photoreal embodiment: Synthesizing humans in conversations

E Ng, J Romero, T Bagautdinov, S Bai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a framework for generating full-bodied photorealistic avatars that gesture
according to the conversational dynamics of a dyadic interaction. Given speech audio we …