General object foundation model for images and videos at scale
We present GLEE in this work an object-level foundation model for locating and identifying
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
objects in images and videos. Through a unified framework GLEEaccomplishes detection …
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Visual program synthesis is a promising approach to exploit the reasoning abilities of large
language models for compositional computer vision tasks. Previous work has used few-shot …
language models for compositional computer vision tasks. Previous work has used few-shot …
Taming Self-Training for Open-Vocabulary Object Detection
Recent studies have shown promising performance in open-vocabulary object detection
(OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs) …
(OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs) …
Generating Enhanced Negatives for Training Language-Based Object Detectors
The recent progress in language-based open-vocabulary object detection can be largely
attributed to finding better ways of leveraging large-scale data with free-form text …
attributed to finding better ways of leveraging large-scale data with free-form text …
ELSA: Evaluating Localization of Social Activities in Urban Streets
Why do some streets attract more social activities than others? Is it due to street design, or
do land use patterns in neighborhoods create opportunities for businesses where people …
do land use patterns in neighborhoods create opportunities for businesses where people …
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Referring Expression Comprehension (REC) is a crucial cross-modal task that objectively
evaluates the capabilities of language understanding, image comprehension, and language …
evaluates the capabilities of language understanding, image comprehension, and language …
[引用][C] 다중시각정보환경에서의지칭표현이해
설한울, 장병탁 - 한국정보과학회학술발표논문집, 2023 - dbpia.co.kr
요 약지칭표현 이해 (Referring Expression Comprehension) 은 이미지와 이미지에 있는
물체를 지칭하는 텍스트를 입력받아 지칭된 물체의 위치를 출력하는 문제이다. 그러나 이는 …
물체를 지칭하는 텍스트를 입력받아 지칭된 물체의 위치를 출력하는 문제이다. 그러나 이는 …