Advancing referring expression segmentation beyond single image

C Xie, Z Zhang, Y Wu, F Zhu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Detecting objects based on language information is a popular task that includes Open-
Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this …

被引用次数：16 相关文章所有 4 个版本

[PDF] thecvf.com

Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model

S Pramanick, G Han, R Hou, S Nag… - Proceedings of the …, 2024 - openaccess.thecvf.com

The ability of large language models (LLMs) to process visual inputs has given rise to
general-purpose vision systems unifying various vision-language (VL) tasks by instruction …

被引用次数：11 相关文章所有 3 个版本

[PDF] thecvf.com

Referring Image Editing: Object-level Image Editing via Referring Expressions

C Liu, X Li, H Ding - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Significant advancements have been made in image editing with the recent advance of the
Diffusion model. However most of the current methods primarily focus on global or subject …

被引用次数：4 相关文章

[PDF] thecvf.com

Decoupling static and hierarchical motion perception for referring video segmentation

S He, H Ding - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …

被引用次数：11 相关文章所有 3 个版本

A survey of methods for addressing the challenges of referring image segmentation

L Ji, Y Du, Y Dang, W Gao, H Zhang - Neurocomputing, 2024 - Elsevier

Referring image segmentation is guided by natural language descriptions to separate the
target objects in an image. This task is different from semantic segmentation and instance …

被引用次数：5 相关文章

[PDF] thecvf.com

Groundhog: Grounding large language models to holistic segmentation

Y Zhang, Z Ma, X Gao, S Shakiah… - Proceedings of the …, 2024 - openaccess.thecvf.com

Most multimodal large language models (MLLMs) learn language-to-object grounding
through causal language modeling where grounded objects are captured by bounding …

被引用次数：13 相关文章所有 6 个版本

[PDF] arxiv.org

Dettoolchain: A new prompting paradigm to unleash detection ability of mllm

Y Wu, Y Wang, S Tang, W Wu, T He, W Ouyang… - arXiv preprint arXiv …, 2024 - arxiv.org

We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object
detection ability of multimodal large language models (MLLMs), such as GPT-4V and …

被引用次数：5 相关文章所有 3 个版本

[PDF] thecvf.com

Referring Expression Counting

S Dai, J Liu, NM Cheung - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Existing counting tasks are limited to the class level which don't account for fine-grained
details within the class. In real applications it often requires in-context or referring human …

被引用次数：2 相关文章

[PDF] arxiv.org

Resmatch: Referring expression segmentation in a semi-supervised manner

Y Zang, C Fu, R Cao, D Zhu, M Zhang, W Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Referring expression segmentation (RES), a task that involves localizing specific instance-
level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation

S Yu, PH Seo, J Son - arXiv preprint arXiv:2407.07412, 2024 - arxiv.org

We propose a new framework that automatically generates high-quality segmentation masks
with their referring expressions as pseudo supervisions for referring image segmentation …

被引用次数：1 相关文章所有 3 个版本