Omnilabel: A challenging benchmark for language-based object detection

J Wu, Y Jiang, Q Liu, Z Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …

被引用次数：33 相关文章所有 3 个版本

[PDF] thecvf.com

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

Z Khan, VK BG, S Schulter, Y Fu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual program synthesis is a promising approach to exploit the reasoning abilities of large
language models for compositional computer vision tasks. Previous work has used few-shot …

被引用次数：5 相关文章所有 5 个版本

[PDF] thecvf.com

Taming Self-Training for Open-Vocabulary Object Detection

S Zhao, S Schulter, L Zhao, Z Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recent studies have shown promising performance in open-vocabulary object detection
(OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs) …

被引用次数：6 相关文章所有 4 个版本

[PDF] thecvf.com

Generating Enhanced Negatives for Training Language-Based Object Detectors

S Zhao, L Zhao, Y Suh, DN Metaxas… - Proceedings of the …, 2024 - openaccess.thecvf.com

The recent progress in language-based open-vocabulary object detection can be largely
attributed to finding better ways of leveraging large-scale data with free-form text …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

ELSA: Evaluating Localization of Social Activities in Urban Streets

M Hosseini, M Cipriano, S Eslami, D Hodczak… - arXiv preprint arXiv …, 2024 - arxiv.org

Why do some streets attract more social activities than others? Is it due to street design, or
do land use patterns in neighborhoods create opportunities for businesses where people …

FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension

J Liu, X Yang, W Li, P Wang - arXiv preprint arXiv:2409.14750, 2024 - arxiv.org

Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively
evaluates the capabilities of language understanding, image comprehension, and language …

[引用][C] 다중시각정보환경에서의지칭표현이해

설한울， 장병탁 - 한국정보과학회학술발표논문집, 2023 - dbpia.co.kr

요 약지칭표현 이해 (Referring Expression Comprehension) 은 이미지와 이미지에 있는
물체를 지칭하는 텍스트를 입력받아 지칭된 물체의 위치를 출력하는 문제이다. 그러나 이는 …