A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
and boosted the state of the art in a variety of areas, such as data mining (eg, social network …
An analysis of graph convolutional networks and recent datasets for visual question answering
AA Yusuf, F Chong, M Xianling - Artificial Intelligence Review, 2022 - Springer
Graph neural network is a deep learning approach widely applied on structural and non-
structural scenarios due to its substantial performance and interpretability recently. In a non …
structural scenarios due to its substantial performance and interpretability recently. In a non …
Docvqa: A dataset for vqa on document images
M Mathew, D Karatzas… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
We present a new dataset for Visual Question Answering (VQA) on document images called
DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images …
DocVQA. The dataset consists of 50,000 questions defined on 12,000+ document images …
Latr: Layout-aware transformer for scene-text vqa
We propose a novel multimodal architecture for Scene Text Visual Question Answering
(STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to …
(STVQA), named Layout-Aware Transformer (LaTr). The task of STVQA requires models to …
Room-and-object aware knowledge reasoning for remote embodied referring expression
Abstract The Remote Embodied Referring Expression (REVERIE) is a recently raised task
that requires an agent to navigate to and localise a referred remote object according to a …
that requires an agent to navigate to and localise a referred remote object according to a …
Boosting visual question answering with context-aware knowledge aggregation
Given an image and a natural language question, Visual Question Answering (VQA) aims at
answering the textual question correctly. Most VQA approaches in literature targets at finding …
answering the textual question correctly. Most VQA approaches in literature targets at finding …
Open-vocabulary object detection with an open corpus
Existing open vocabulary object detection (OVD) works expand the object detector toward
open categories by replacing the classifier with the category text embeddings and optimizing …
open categories by replacing the classifier with the category text embeddings and optimizing …
Room-object entity prompting and reasoning for embodied referring expression
Given a high-level instruction, the task of Embodied Referring Expression (REVERIE)
requires an embodied agent to localise a remote referred object via navigating in the …
requires an embodied agent to localise a remote referred object via navigating in the …
Beyond ocr+ vqa: involving ocr into the flow for robust and accurate textvqa
Text-based visual question answering (TextVQA) requires analyzing both the visual contents
and texts in an image to answer a question, which is more practical than general visual …
and texts in an image to answer a question, which is more practical than general visual …
Learning situation hyper-graphs for video question answering
Answering questions about complex situations in videos requires not only capturing of the
presence of actors, objects, and their relations, but also the evolution of these relationships …
presence of actors, objects, and their relations, but also the evolution of these relationships …