Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

Semi-supervised vocabulary-informed learning

Y Fu, L Sigal - Proceedings of the IEEE conference on computer …, 2016 - cv-foundation.org
Despite significant progress in object categorization, in recent years, a number of important
challenges remain; mainly, ability to learn from limited labeled data and ability to recognize …

Open vocabulary object detection with pseudo bounding-box labels

M Gao, C Xing, JC Niebles, J Li, R Xu, W Liu… - … on Computer Vision, 2022 - Springer
Despite great progress in object detection, most existing methods work only on a limited set
of object categories, due to the tremendous human effort needed for bounding-box …

Edadet: Open-vocabulary object detection using early dense alignment

C Shi, S Yang - Proceedings of the IEEE/CVF international …, 2023 - openaccess.thecvf.com
Vision-language models such as CLIP have boosted the performance of open-vocabulary
object detection, where the detector is trained on base categories but required to detect …

Open vocabulary semantic segmentation with patch aligned contrastive learning

J Mukhoti, TY Lin, O Poursaeed… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility
function for CLIP's contrastive loss, intending to train an alignment between the patch tokens …

Extract free dense labels from clip

C Zhou, CC Loy, B Dai - European Conference on Computer Vision, 2022 - Springer
Abstract Contrastive Language-Image Pre-training (CLIP) has made a remarkable
breakthrough in open-vocabulary zero-shot image recognition. Many recent studies …

Non-contrastive learning meets language-image pre-training

J Zhou, L Dong, Z Gan, L Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align
images and texts. Nonetheless, the loose correlation between images and texts of web …

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

Learning to detect and segment for open vocabulary object detection

T Wang - Proceedings of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Open vocabulary object detection has been greately advanced by the recent development of
vision-language pre-trained model, which helps recognizing the novel objects with only …

Open-vocabulary object detection using captions

A Zareian, KD Rosa, DH Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Despite the remarkable accuracy of deep neural networks in object detection, they are costly
to train and scale due to supervision requirements. Particularly, learning more object …