A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu, S Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …

Panoptic segmentation: A review

O Elharrouss, S Al-Maadeed, N Subramanian… - arXiv preprint arXiv …, 2021 - arxiv.org
Image segmentation for video analysis plays an essential role in different research fields
such as smart city, healthcare, computer vision and geoscience, and remote sensing …

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

3DVG-Transformer: Relation modeling for visual grounding on point clouds

L Zhao, D Cai, L Sheng, D Xu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Visual grounding on 3D point clouds is an emerging vision and language task that benefits
various applications in understanding the 3D visual world. By formulating this task as a …

Scanqa: 3d question answering for spatial scene understanding

D Azuma, T Miyanishi, S Kurita… - proceedings of the …, 2022 - openaccess.thecvf.com
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the
3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D …

3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds

D Cai, L Zhao, J Zhang, L Sheng… - Proceedings of the …, 2022 - openaccess.thecvf.com
Observing that the 3D captioning task and the 3D grounding task contain both shared and
complementary information in nature, in this work, we propose a unified framework to jointly …

Eda: Explicit text-decoupling and dense alignment for 3d visual grounding

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …

Multi-view transformer for 3d visual grounding

S Huang, Y Chen, J Jia, L Wang - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The 3D visual grounding task aims to ground a natural language description to the targeted
object in a 3D scene, which is usually represented in 3D point clouds. Previous works …

Context-aware alignment and mutual masking for 3d-language pre-training

Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding

Z Guo, Y Tang, R Zhang, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …