Object detection in 20 years: A survey

Z Zou, K Chen, Z Shi, Y Guo, J Ye - Proceedings of the IEEE, 2023 - ieeexplore.ieee.org
Object detection, as of one the most fundamental and challenging problems in computer
vision, has received great attention in recent years. Over the past two decades, we have …

Maple: Multi-modal prompt learning

MU Khattak, H Rasheed, M Maaz… - Proceedings of the …, 2023 - openaccess.thecvf.com
Pre-trained vision-language (VL) models such as CLIP have shown excellent generalization
ability to downstream tasks. However, they are sensitive to the choice of input text prompts …

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Self-regulating prompts: Foundational model adaptation without forgetting

MU Khattak, ST Wasim, M Naseer… - Proceedings of the …, 2023 - openaccess.thecvf.com
Prompt learning has emerged as an efficient alternative for fine-tuning foundational models,
such as CLIP, for various downstream tasks. Conventionally trained using the task-specific …

Detecting everything in the open world: Towards universal object detection

Z Wang, Y Li, X Chen, SN Lim… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we formally address universal object detection, which aims to detect every
scene and predict every category. The dependence on human annotations, the limited …

RSPrompter: Learning to prompt for remote sensing instance segmentation based on visual foundation model

K Chen, C Liu, H Chen, H Zhang, W Li… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Leveraging the extensive training data from SA-1B, the segment anything model (SAM)
demonstrates remarkable generalization and zero-shot capabilities. However, as a category …

Region-aware pretraining for open-vocabulary object detection with vision transformers

D Kim, A Angelova, W Kuo - … of the IEEE/CVF conference on …, 2023 - openaccess.thecvf.com
Abstract We present Region-aware Open-vocabulary Vision Transformers (RO-ViT)--a
contrastive image-text pretraining recipe to bridge the gap between image-level pretraining …

Codet: Co-occurrence guided region-word alignment for open-vocabulary object detection

C Ma, Y Jiang, X Wen, Z Yuan… - Advances in neural …, 2024 - proceedings.neurips.cc
Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level
vision-language representations for open-vocabulary object detection. Existing methods …

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …