A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective
Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …
Panoptic segmentation: A review
Image segmentation for video analysis plays an essential role in different research fields
such as smart city, healthcare, computer vision and geoscience, and remote sensing …
such as smart city, healthcare, computer vision and geoscience, and remote sensing …
Multi3drefer: Grounding text description to multiple 3d objects
We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …
3DVG-Transformer: Relation modeling for visual grounding on point clouds
Visual grounding on 3D point clouds is an emerging vision and language task that benefits
various applications in understanding the 3D visual world. By formulating this task as a …
various applications in understanding the 3D visual world. By formulating this task as a …
Scanqa: 3d question answering for spatial scene understanding
We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the
3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D …
3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D …
3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds
Observing that the 3D captioning task and the 3D grounding task contain both shared and
complementary information in nature, in this work, we propose a unified framework to jointly …
complementary information in nature, in this work, we propose a unified framework to jointly …
Eda: Explicit text-decoupling and dense alignment for 3d visual grounding
Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …
form natural language descriptions with rich semantic cues. However, existing methods …
Multi-view transformer for 3d visual grounding
The 3D visual grounding task aims to ground a natural language description to the targeted
object in a 3D scene, which is usually represented in 3D point clouds. Previous works …
object in a 3D scene, which is usually represented in 3D point clouds. Previous works …
Context-aware alignment and mutual masking for 3d-language pre-training
Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …
Viewrefer: Grasp the multi-view knowledge for 3d visual grounding
Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …