A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation

F Zhang, T Zhou, B Li, H He, C Ma… - Advances in …, 2023 - proceedings.neurips.cc
This paper studies the problem of weakly open-vocabulary semantic segmentation
(WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs …

LLaFS: When Large Language Models Meet Few-Shot Segmentation

L Zhu, T Chen, D Ji, J Ye, J Liu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper proposes LLaFS the first attempt to leverage large language models (LLMs) in
few-shot segmentation. In contrast to the conventional few-shot segmentation methods that …

Turbo: Informativity-driven acceleration plug-in for vision-language models

C Ju, H Wang, Z Li, X Chen, Z Zhai, W Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation

H Shi, SD Dao, J Cai - International Journal of Computer Vision, 2024 - Springer
Open-vocabulary (OV) semantic segmentation has attracted increasing attention in recent
years, which aims to recognize objects in an open class set for real-world applications …

Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation

MEA Boudjoghra, A Dai, J Lahoud, H Cholakkal… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent works on open-vocabulary 3D instance segmentation show strong promise, but at
the cost of slow inference speed and high computation requirements. This high computation …

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

Open Panoramic Segmentation

J Zheng, R Liu, Y Chen, K Peng, C Wu, K Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Panoramic images, capturing a 360 {\deg} field of view (FoV), encompass omnidirectional
spatial information crucial for scene understanding. However, it is not only costly to obtain …

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

H Cheng, C Ju, H Wang, J Liu, M Chen, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …

Image to Multi-Modal Retrieval for Industrial Scenarios

Z Cheng, C Ju, X Chen, Z Zhai, S Xiao, X Zeng… - arXiv preprint arXiv …, 2023 - arxiv.org
We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval
(IMMR), where the query is an image and the doc is an entity with both image and textual …