相关文章- 学术资源搜索

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

被引用次数：116 相关文章所有 10 个版本

[PDF] cv-foundation.org

Semi-supervised vocabulary-informed learning

Y Fu, L Sigal - Proceedings of the IEEE conference on computer …, 2016 - cv-foundation.org

Despite significant progress in object categorization, in recent years, a number of important
challenges remain; mainly, ability to learn from limited labeled data and ability to recognize …

被引用次数：171 相关文章所有 15 个版本

[PDF] arxiv.org

Open vocabulary object detection with pseudo bounding-box labels

M Gao, C Xing, JC Niebles, J Li, R Xu, W Liu… - … on Computer Vision, 2022 - Springer

Despite great progress in object detection, most existing methods work only on a limited set
of object categories, due to the tremendous human effort needed for bounding-box …

被引用次数：85 相关文章所有 5 个版本

[PDF] thecvf.com

Edadet: Open-vocabulary object detection using early dense alignment

C Shi, S Yang - Proceedings of the IEEE/CVF international …, 2023 - openaccess.thecvf.com

Vision-language models such as CLIP have boosted the performance of open-vocabulary
object detection, where the detector is trained on base categories but required to detect …

被引用次数：37 相关文章所有 6 个版本

[PDF] thecvf.com

Open vocabulary semantic segmentation with patch aligned contrastive learning

J Mukhoti, TY Lin, O Poursaeed… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce Patch Aligned Contrastive Learning (PACL), a modified compatibility
function for CLIP's contrastive loss, intending to train an alignment between the patch tokens …

被引用次数：85 相关文章所有 8 个版本

[PDF] arxiv.org

Extract free dense labels from clip

C Zhou, CC Loy, B Dai - European Conference on Computer Vision, 2022 - Springer

Abstract Contrastive Language-Image Pre-training (CLIP) has made a remarkable
breakthrough in open-vocabulary zero-shot image recognition. Many recent studies …

被引用次数：535 相关文章所有 5 个版本

[PDF] thecvf.com

Non-contrastive learning meets language-image pre-training

J Zhou, L Dong, Z Gan, L Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align
images and texts. Nonetheless, the loose correlation between images and texts of web …

被引用次数：23 相关文章所有 5 个版本

[PDF] thecvf.com

General object foundation model for images and videos at scale

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

被引用次数：34 相关文章所有 3 个版本

[PDF] thecvf.com

Learning to detect and segment for open vocabulary object detection

T Wang - Proceedings of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Open vocabulary object detection has been greately advanced by the recent development of
vision-language pre-trained model, which helps recognizing the novel objects with only …

被引用次数：25 相关文章所有 5 个版本

[PDF] thecvf.com

Open-vocabulary object detection using captions

A Zareian, KD Rosa, DH Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com

Despite the remarkable accuracy of deep neural networks in object detection, they are costly
to train and scale due to supervision requirements. Particularly, learning more object …

被引用次数：437 相关文章所有 6 个版本