A survey on open-vocabulary detection and segmentation: Past, present, and future
As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …
have made tremendous progress in deep learning era. Due to the expensive manual …
Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation
This paper studies the problem of weakly open-vocabulary semantic segmentation
(WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs …
(WOVSS), which learns to segment objects of arbitrary classes using mere image-text pairs …
LLaFS: When Large Language Models Meet Few-Shot Segmentation
This paper proposes LLaFS the first attempt to leverage large language models (LLMs) in
few-shot segmentation. In contrast to the conventional few-shot segmentation methods that …
few-shot segmentation. In contrast to the conventional few-shot segmentation methods that …
Turbo: Informativity-driven acceleration plug-in for vision-language models
Vision-Language Large Models (VLMs) have become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
LLMFormer: Large Language Model for Open-Vocabulary Semantic Segmentation
Open-vocabulary (OV) semantic segmentation has attracted increasing attention in recent
years, which aims to recognize objects in an open class set for real-world applications …
years, which aims to recognize objects in an open class set for real-world applications …
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Recent works on open-vocabulary 3D instance segmentation show strong promise, but at
the cost of slow inference speed and high computation requirements. This high computation …
the cost of slow inference speed and high computation requirements. This high computation …
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
Open Panoramic Segmentation
Panoramic images, capturing a 360 {\deg} field of view (FoV), encompass omnidirectional
spatial information crucial for scene understanding. However, it is not only costly to obtain …
spatial information crucial for scene understanding. However, it is not only costly to obtain …
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Image to Multi-Modal Retrieval for Industrial Scenarios
We formally define a novel valuable information retrieval task: image-to-multi-modal-retrieval
(IMMR), where the query is an image and the doc is an entity with both image and textual …
(IMMR), where the query is an image and the doc is an entity with both image and textual …