Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Bal: Balancing diversity and novelty for active learning

J Li, P Chen, S Yu, S Liu, J Jia - IEEE Transactions on Pattern …, 2023 - ieeexplore.ieee.org
The objective of Active Learning is to strategically label a subset of the dataset to maximize
performance within a predetermined labeling budget. In this study, we harness features …

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

MOODv2: Masked Image Modeling for Out-of-Distribution Detection

J Li, P Chen, S Yu, S Liu, J Jia - IEEE transactions on pattern …, 2024 - ieeexplore.ieee.org
The crux of effective out-of-distribution (OOD) detection lies in acquiring a robust in-
distribution (ID) representation, distinct from OOD samples. While previous methods …

AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation

J Ge, L Xie, H Xie, P Li, X Zhang, Y Zhang… - European Conference on …, 2025 - Springer
A serious issue that harms the performance of zero-shot visual recognition is named
objective misalignment, ie, the learning objective prioritizes improving the recognition …

Open Panoramic Segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - … on Computer Vision, 2025 - Springer
Abstract Panoramic images, capturing a 360\(^\circ\) field of view (FoV), encompass
omnidirectional spatial information crucial for scene understanding. However, it is not only …

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Y Li, ZY Li, Q Zeng, Q Hou, MM Cheng - arXiv preprint arXiv:2406.00670, 2024 - arxiv.org
Pre-trained vision-language models, eg, CLIP, have been successfully applied to zero-shot
semantic segmentation. Existing CLIP-based approaches primarily utilize visual features …

QuickLLaMA: Query-aware Inference Acceleration for Large Language Models

J Li, H Shi, X Jiang, Z Li, H Xu, J Jia - arXiv preprint arXiv:2406.07528, 2024 - arxiv.org
The capacity of Large Language Models (LLMs) to comprehend and reason over long
contexts is pivotal for advancements in diverse fields. Yet, they still stuggle with capturing …

Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation

Z Zhang, T Zhang, Y Zhu, J Liu, X Liang, QX Ye… - arXiv preprint arXiv …, 2024 - arxiv.org
The pre-trained vision-language model, exemplified by CLIP, advances zero-shot semantic
segmentation by aligning visual features with class embeddings through a transformer …