Instancerefer: Cooperative holistic understanding for visual grounding on point clouds through...

A survey on graph neural networks and graph transformers in computer vision: a task-oriented perspective

C Chen, Y Wu, Q Dai, HY Zhou, M Xu, S Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

Graph Neural Networks (GNNs) have gained momentum in graph representation learning
and boosted the state of the art in a variety of areas, such as data mining (\emph {eg,} social …

被引用次数：45 相关文章所有 3 个版本

[PDF] arxiv.org

Panoptic segmentation: A review

O Elharrouss, S Al-Maadeed, N Subramanian… - arXiv preprint arXiv …, 2021 - arxiv.org

Image segmentation for video analysis plays an essential role in different research fields
such as smart city, healthcare, computer vision and geoscience, and remote sensing …

被引用次数：40 相关文章所有 3 个版本

[PDF] thecvf.com

Multi3drefer: Grounding text description to multiple 3d objects

Y Zhang, ZM Gong, AX Chang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We introduce the task of localizing a flexible number of objects in real-world 3D scenes
using natural language descriptions. Existing 3D visual grounding tasks focus on localizing …

被引用次数：26 相关文章所有 7 个版本

[PDF] thecvf.com

3DVG-Transformer: Relation modeling for visual grounding on point clouds

L Zhao, D Cai, L Sheng, D Xu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Visual grounding on 3D point clouds is an emerging vision and language task that benefits
various applications in understanding the 3D visual world. By formulating this task as a …

被引用次数：119 相关文章所有 5 个版本

[PDF] thecvf.com

Scanqa: 3d question answering for spatial scene understanding

D Azuma, T Miyanishi, S Kurita… - proceedings of the …, 2022 - openaccess.thecvf.com

We propose a new 3D spatial understanding task of 3D Question Answering (3D-QA). In the
3D-QA task, models receive visual information from the entire 3D scene of the rich RGB-D …

被引用次数：101 相关文章所有 6 个版本

[PDF] thecvf.com

3djcg: A unified framework for joint dense captioning and visual grounding on 3d point clouds

D Cai, L Zhao, J Zhang, L Sheng… - Proceedings of the …, 2022 - openaccess.thecvf.com

Observing that the 3D captioning task and the 3D grounding task contain both shared and
complementary information in nature, in this work, we propose a unified framework to jointly …

被引用次数：77 相关文章所有 3 个版本

[PDF] thecvf.com

Eda: Explicit text-decoupling and dense alignment for 3d visual grounding

Y Wu, X Cheng, R Zhang, Z Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract 3D visual grounding aims to find the object within point clouds mentioned by free-
form natural language descriptions with rich semantic cues. However, existing methods …

被引用次数：52 相关文章所有 5 个版本

[PDF] thecvf.com

Multi-view transformer for 3d visual grounding

S Huang, Y Chen, J Jia, L Wang - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The 3D visual grounding task aims to ground a natural language description to the targeted
object in a 3D scene, which is usually represented in 3D point clouds. Previous works …

被引用次数：81 相关文章所有 5 个版本

[PDF] thecvf.com

Context-aware alignment and mutual masking for 3d-language pre-training

Z Jin, M Hayat, Y Yang, Y Guo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract 3D visual language reasoning plays an important role in effective human-computer
interaction. The current approaches for 3D visual reasoning are task-specific, and lack pre …

被引用次数：28 相关文章所有 3 个版本

[PDF] thecvf.com

Viewrefer: Grasp the multi-view knowledge for 3d visual grounding

Z Guo, Y Tang, R Zhang, D Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding 3D scenes from multi-view inputs has been proven to alleviate the view
discrepancy issue in 3D visual grounding. However, existing methods normally neglect the …

被引用次数：18 相关文章所有 3 个版本