Compute to tell the tale: Goal-driven narrative generation

Y Wong, S Fan, Y Guo, Z Xu, K Stephen… - Proceedings of the 30th …, 2022 - dl.acm.org
Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …

Change detection meets visual question answering

Z Yuan, L Mou, Z Xiong, XX Zhu - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The Earth's surface is continually changing, and identifying changes plays an important role
in urban planning and sustainability. Although change detection techniques have been …

InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues

H Zhang, J Xu, Y Mo, T Kong - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates leading to reduced …

A survey on multimodal dialogue systems: recent advances and new frontiers

G Liu, S Wang, J Yu, J Yin - 2022 5th International Conference …, 2022 - ieeexplore.ieee.org
Recently, there has been growing interest in the field of multimodal dialogue systems.
Different from traditional unimodal dialogue systems, our task needs to understand the …

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier
Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …

InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

H Zhang, J Xu, Y Mo, T Kong - arXiv preprint arXiv:2310.12147, 2023 - arxiv.org
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates, leading to reduced …

Pointing out human answer mistakes in a goal-oriented visual dialogue

R Oshima, S Shinagawa… - Proceedings of the …, 2023 - openaccess.thecvf.com
Effective communication between humans and intelligent agents has promising applications
for solving complex problems. One such approach is visual dialogue, which leverages …

VisualHow: Multimodal problem solving

J Yang, X Chen, M Jiang, S Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recent progress in the interdisciplinary studies of computer vision (CV) and natural
language processing (NLP) has enabled the development of intelligent systems that can …

Artificial intelligence models do not ground negation, humans do. guesswhat?! dialogues as a case study

A Testoni, C Greco, R Bernardi - Frontiers in big Data, 2022 - frontiersin.org
Negation is widely present in human communication, yet it is largely neglected in the
research on conversational agents based on neural network architectures. Cognitive studies …

SINet: Improving relational features in two-stage referring expression comprehension

W Guo, Y Zhang, X Yuan - Expert Systems with Applications, 2024 - Elsevier
Referring expression comprehension (REC) requires locating the region referred by the
expression, where one of the key challenges is to distinguish the correct object from other of …