Described object detection: Liberating object detection with flexible expressions
Detecting objects based on language information is a popular task that includes Open-
Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this …
Vocabulary object Detection (OVD) and Referring Expression Comprehension (REC). In this …
Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model
The ability of large language models (LLMs) to process visual inputs has given rise to
general-purpose vision systems unifying various vision-language (VL) tasks by instruction …
general-purpose vision systems unifying various vision-language (VL) tasks by instruction …
Referring Image Editing: Object-level Image Editing via Referring Expressions
Significant advancements have been made in image editing with the recent advance of the
Diffusion model. However most of the current methods primarily focus on global or subject …
Diffusion model. However most of the current methods primarily focus on global or subject …
Decoupling static and hierarchical motion perception for referring video segmentation
Referring video segmentation relies on natural language expressions to identify and
segment objects often emphasizing motion clues. Previous works treat a sentence as a …
segment objects often emphasizing motion clues. Previous works treat a sentence as a …
A survey of methods for addressing the challenges of referring image segmentation
L Ji, Y Du, Y Dang, W Gao, H Zhang - Neurocomputing, 2024 - Elsevier
Referring image segmentation is guided by natural language descriptions to separate the
target objects in an image. This task is different from semantic segmentation and instance …
target objects in an image. This task is different from semantic segmentation and instance …
Groundhog: Grounding large language models to holistic segmentation
Most multimodal large language models (MLLMs) learn language-to-object grounding
through causal language modeling where grounded objects are captured by bounding …
through causal language modeling where grounded objects are captured by bounding …
Dettoolchain: A new prompting paradigm to unleash detection ability of mllm
We present DetToolChain, a novel prompting paradigm, to unleash the zero-shot object
detection ability of multimodal large language models (MLLMs), such as GPT-4V and …
detection ability of multimodal large language models (MLLMs), such as GPT-4V and …
Referring Expression Counting
Existing counting tasks are limited to the class level which don't account for fine-grained
details within the class. In real applications it often requires in-context or referring human …
details within the class. In real applications it often requires in-context or referring human …
Resmatch: Referring expression segmentation in a semi-supervised manner
Y Zang, C Fu, R Cao, D Zhu, M Zhang, W Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Referring expression segmentation (RES), a task that involves localizing specific instance-
level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in …
level objects based on free-form linguistic descriptions, has emerged as a crucial frontier in …
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
We propose a new framework that automatically generates high-quality segmentation masks
with their referring expressions as pseudo supervisions for referring image segmentation …
with their referring expressions as pseudo supervisions for referring image segmentation …