A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …

An analysis of graph convolutional networks and recent datasets for visual question answering

AA Yusuf, F Chong, M Xianling - Artificial Intelligence Review, 2022 - Springer
Graph neural network is a deep learning approach widely applied on structural and non-
structural scenarios due to its substantial performance and interpretability recently. In a non …

Docvqa: A dataset for vqa on document images

M Mathew, D Karatzas… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We present a new dataset for Visual Question Answering (VQA) on document images called
DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images …

Latr: Layout-aware transformer for scene-text vqa

AF Biten, R Litman, Y Xie… - Proceedings of the …, 2022 - openaccess.thecvf.com
We propose a novel multimodal architecture for Scene Text Visual Question Answering
(STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to …

Room-and-object aware knowledge reasoning for remote embodied referring expression

C Gao, J Chen, S Liu, L Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract The Remote Embodied Referring Expression (REVERIE) is a recently raised task
that requires an agent to navigate to and localise a referred remote object according to a …

Boosting visual question answering with context-aware knowledge aggregation

G Li, X Wang, W Zhu - Proceedings of the 28th ACM International …, 2020 - dl.acm.org
Given an image and a natural language question, Visual Question Answering (VQA) aims at
answering the textual question correctly. Most VQA approaches in literature targets at finding …

Open-vocabulary object detection with an open corpus

J Wang, H Zhang, H Hong, X Jin… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing open vocabulary object detection (OVD) works expand the object detector toward
open categories by replacing the classifier with the category text embeddings and optimizing …

Room-object entity prompting and reasoning for embodied referring expression

C Gao, S Liu, J Chen, L Wang, Q Wu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Given a high-level instruction, the task of Embodied Referring Expression (REVERIE)
requires an embodied agent to localise a remote referred object via navigating in the …

Beyond ocr+ vqa: involving ocr into the flow for robust and accurate textvqa

G Zeng, Y Zhang, Y Zhou, X Yang - Proceedings of the 29th ACM …, 2021 - dl.acm.org
Text-based visual question answering (TextVQA) requires analyzing both the visual contents
and texts in an image to answer a question, which is more practical than general visual …

Learning situation hyper-graphs for video question answering

A Urooj, H Kuehne, B Wu, K Chheu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Answering questions about complex situations in videos requires not only capturing of the
presence of actors, objects, and their relations, but also the evolution of these relationships …