Towards open vocabulary learning: A survey
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …
advancements in various core tasks like segmentation, tracking, and detection. However …
A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Bal: Balancing diversity and novelty for active learning
The objective of Active Learning is to strategically label a subset of the dataset to maximize
performance within a predetermined labeling budget. In this study, we harness features …
performance within a predetermined labeling budget. In this study, we harness features …
[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …
of the widespread adoption of vision-based foundational models. Its primary objective is to …
MOODv2: Masked Image Modeling for Out-of-Distribution Detection
The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-
distribution (ID) representation, distinct from OOD samples. While previous methods …
distribution (ID) representation, distinct from OOD samples. While previous methods …
AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation
A serious issue that harms the performance of zero-shot visual recognition is named
objective misalignment, ie, the learning objective prioritizes improving the recognition …
objective misalignment, ie, the learning objective prioritizes improving the recognition …
Open Panoramic Segmentation
Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …
omnidirectional spatial information crucial for scene understanding. However, it is not only …
Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
Pre-trained vision-language models, eg, CLIP, have been successfully applied to zero-shot
semantic segmentation. Existing CLIP-based approaches primarily utilize visual features …
semantic segmentation. Existing CLIP-based approaches primarily utilize visual features …
QuickLLaMA: Query-aware Inference Acceleration for Large Language Models
The capacity of Large Language Models (LLMs) to comprehend and reason over long
contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing …
contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing …
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic
segmentation by aligning visual features with class embeddings through a transformer …
segmentation by aligning visual features with class embeddings through a transformer …