Compute to tell the tale: Goal-driven narrative generation
Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …
Change detection meets visual question answering
The Earth's surface is continually changing, and identifying changes plays an important role
in urban planning and sustainability. Although change detection techniques have been …
in urban planning and sustainability. Although change detection techniques have been …
InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates leading to reduced …
Interaction (HRI) have often relied on predefined interaction templates leading to reduced …
A survey on multimodal dialogue systems: recent advances and new frontiers
G Liu, S Wang, J Yu, J Yin - 2022 5th International Conference …, 2022 - ieeexplore.ieee.org
Recently, there has been growing interest in the field of multimodal dialogue systems.
Different from traditional unimodal dialogue systems, our task needs to understand the …
Different from traditional unimodal dialogue systems, our task needs to understand the …
HVLM: Exploring human-like visual cognition and language-memory network for visual dialog
K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier
Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …
humans grounded in a given image. To generate appropriate answers for a series of …
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates, leading to reduced …
Interaction (HRI) have often relied on predefined interaction templates, leading to reduced …
Pointing out human answer mistakes in a goal-oriented visual dialogue
R Oshima, S Shinagawa… - Proceedings of the …, 2023 - openaccess.thecvf.com
Effective communication between humans and intelligent agents has promising applications
for solving complex problems. One such approach is visual dialogue, which leverages …
for solving complex problems. One such approach is visual dialogue, which leverages …
VisualHow: Multimodal problem solving
Recent progress in the interdisciplinary studies of computer vision (CV) and natural
language processing (NLP) has enabled the development of intelligent systems that can …
language processing (NLP) has enabled the development of intelligent systems that can …
Artificial intelligence models do not ground negation, humans do. guesswhat?! dialogues as a case study
Negation is widely present in human communication, yet it is largely neglected in the
research on conversational agents based on neural network architectures. Cognitive studies …
research on conversational agents based on neural network architectures. Cognitive studies …
SINet: Improving relational features in two-stage referring expression comprehension
W Guo, Y Zhang, X Yuan - Expert Systems with Applications, 2024 - Elsevier
Referring expression comprehension (REC) requires locating the region referred by the
expression, where one of the key challenges is to distinguish the correct object from other of …
expression, where one of the key challenges is to distinguish the correct object from other of …