Learning better visual dialog agents with pretrained visual-linguistic representation

Y Wong, S Fan, Y Guo, Z Xu, K Stephen… - Proceedings of the 30th …, 2022 - dl.acm.org

Man is by nature a social animal. One important facet of human evolution is through
narrative imagination, be it fictional or factual, and to tell the tale to other individuals. The …

被引用次数：15 相关文章所有 2 个版本

[PDF] ieee.org

Change detection meets visual question answering

Z Yuan, L Mou, Z Xiong, XX Zhu - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

The Earth's surface is continually changing, and identifying changes plays an important role
in urban planning and sustainability. Although change detection techniques have been …

被引用次数：53 相关文章所有 3 个版本

[PDF] thecvf.com

InViG: Benchmarking Open-Ended Interactive Visual Grounding with 500K Dialogues

H Zhang, J Xu, Y Mo, T Kong - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates leading to reduced …

A survey on multimodal dialogue systems: recent advances and new frontiers

G Liu, S Wang, J Yu, J Yin - 2022 5th International Conference …, 2022 - ieeexplore.ieee.org

Recently, there has been growing interest in the field of multimodal dialogue systems.
Different from traditional unimodal dialogue systems, our task needs to understand the …

被引用次数：11 相关文章所有 2 个版本

HVLM: Exploring human-like visual cognition and language-memory network for visual dialog

K Sun, C Guo, H Zhang, Y Li - Information Processing & Management, 2022 - Elsevier

Visual dialog, a visual-language task, enables an AI agent to engage in conversation with
humans grounded in a given image. To generate appropriate answers for a series of …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

H Zhang, J Xu, Y Mo, T Kong - arXiv preprint arXiv:2310.12147, 2023 - arxiv.org

Ambiguity is ubiquitous in human communication. Previous approaches in Human-Robot
Interaction (HRI) have often relied on predefined interaction templates, leading to reduced …

被引用次数：3 相关文章所有 3 个版本

[PDF] thecvf.com

Pointing out human answer mistakes in a goal-oriented visual dialogue

R Oshima, S Shinagawa… - Proceedings of the …, 2023 - openaccess.thecvf.com

Effective communication between humans and intelligent agents has promising applications
for solving complex problems. One such approach is visual dialogue, which leverages …

被引用次数：3 相关文章所有 7 个版本

[PDF] thecvf.com

VisualHow: Multimodal problem solving

J Yang, X Chen, M Jiang, S Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

Recent progress in the interdisciplinary studies of computer vision (CV) and natural
language processing (NLP) has enabled the development of intelligent systems that can …

被引用次数：6 相关文章所有 8 个版本

[PDF] frontiersin.org

Artificial intelligence models do not ground negation, humans do. guesswhat?! dialogues as a case study

A Testoni, C Greco, R Bernardi - Frontiers in big Data, 2022 - frontiersin.org

Negation is widely present in human communication, yet it is largely neglected in the
research on conversational agents based on neural network architectures. Cognitive studies …

被引用次数：7 相关文章所有 7 个版本

SINet: Improving relational features in two-stage referring expression comprehension

W Guo, Y Zhang, X Yuan - Expert Systems with Applications, 2024 - Elsevier

Referring expression comprehension (REC) requires locating the region referred by the
expression, where one of the key challenges is to distinguish the correct object from other of …

被引用次数：1 相关文章