A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Visual in-context prompting

F Li, Q Jiang, H Zhang, T Ren, S Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In-context prompting in large language models (LLMs) has become a prevalent approach to
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …

Language-conditioned detection transformer

JH Cho, P Krähenbühl - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …

T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Q Jiang, F Li, Z Zeng, T Ren, S Liu, L Zhang - arXiv preprint arXiv …, 2024 - arxiv.org
We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …

OVMR: Open-Vocabulary Recognition with Multi-Modal References

Z Ma, S Zhang, L Wei, Q Tian - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The challenge of open-vocabulary recognition lies in the model has no clue of new
categories it is applied to. Existing works have proposed different methods to embed …

Exploring multi-modal contextual knowledge for open-vocabulary object detection

Y Xu, M Zhang, X Yang, C Xu - arXiv preprint arXiv:2308.15846, 2023 - arxiv.org
In this paper, we for the first time explore helpful multi-modal contextual knowledge to
understand novel categories for open-vocabulary object detection (OVD). The multi-modal …

Grounding DINO 1.5: Advance the" Edge" of Open-Set Object Detection

T Ren, Q Jiang, S Liu, Z Zeng, W Liu, H Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection
models developed by IDEA Research, which aims to advance the" Edge" of open-set object …

Boosting segment anything model towards open-vocabulary learning

X Han, L Wei, X Yu, Z Dou, X He, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision
foundation model, showcasing potent zero-shot generalization and flexible prompting …

Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection

H Zhang, BB Gao, Y Zeng, X Tian, X Tan… - Proceedings of the …, 2024 - ojs.aaai.org
Class-incremental object detection (CIOD) is a real-world desired capability, requiring an
object detector to continuously adapt to new tasks without forgetting learned ones, with the …

OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion

H Wang, P Ren, Z Jie, X Dong, C Feng, Y Qian… - arXiv preprint arXiv …, 2024 - arxiv.org
Open-vocabulary detection is a challenging task due to the requirement of detecting objects
based on class names, including those not encountered during training. Existing methods …