A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Visual in-context prompting
In-context prompting in large language models (LLMs) has become a prevalent approach to
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …
improve zero-shot capabilities but this idea is less explored in the vision domain. Existing …
Language-conditioned detection transformer
JH Cho, P Krähenbühl - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We present a new open-vocabulary detection framework. Our framework uses both image-
level labels and detailed detection annotations when available. Our framework proceeds in …
level labels and detailed detection annotations when available. Our framework proceeds in …
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
We present T-Rex2, a highly practical model for open-set object detection. Previous open-
set object detection methods relying on text prompts effectively encapsulate the abstract …
set object detection methods relying on text prompts effectively encapsulate the abstract …
OVMR: Open-Vocabulary Recognition with Multi-Modal References
The challenge of open-vocabulary recognition lies in the model has no clue of new
categories it is applied to. Existing works have proposed different methods to embed …
categories it is applied to. Existing works have proposed different methods to embed …
Exploring multi-modal contextual knowledge for open-vocabulary object detection
In this paper, we for the first time explore helpful multi-modal contextual knowledge to
understand novel categories for open-vocabulary object detection (OVD). The multi-modal …
understand novel categories for open-vocabulary object detection (OVD). The multi-modal …
Grounding DINO 1.5: Advance the" Edge" of Open-Set Object Detection
This paper introduces Grounding DINO 1.5, a suite of advanced open-set object detection
models developed by IDEA Research, which aims to advance the" Edge" of open-set object …
models developed by IDEA Research, which aims to advance the" Edge" of open-set object …
Boosting segment anything model towards open-vocabulary learning
The recent Segment Anything Model (SAM) has emerged as a new paradigmatic vision
foundation model, showcasing potent zero-shot generalization and flexible prompting …
foundation model, showcasing potent zero-shot generalization and flexible prompting …
Learning Task-Aware Language-Image Representation for Class-Incremental Object Detection
Class-incremental object detection (CIOD) is a real-world desired capability, requiring an
object detector to continuously adapt to new tasks without forgetting learned ones, with the …
object detector to continuously adapt to new tasks without forgetting learned ones, with the …
OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Open-vocabulary detection is a challenging task due to the requirement of detecting objects
based on class names, including those not encountered during training. Existing methods …
based on class names, including those not encountered during training. Existing methods …