Vizwiz-priv: A dataset for recognizing the presence and purpose of private visual information...

B Meden, P Rot, P Terhörst, N Damer… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

Biometric recognition technology has made significant advances over the last decade and is
now used across a number of services and applications. However, this widespread …

被引用次数：144 相关文章所有 7 个版本

[PDF] arxiv.org

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier

The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Captioning images taken by people who are blind

D Gurari, Y Zhao, M Zhang, N Bhattacharya - Computer Vision–ECCV …, 2020 - Springer

While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …

被引用次数：211 相关文章所有 7 个版本

[HTML] mdpi.com

[HTML][HTML] Smart glass system using deep learning for the blind and visually impaired

M Mukhiddinov, J Cho - Electronics, 2021 - mdpi.com

Individuals suffering from visual impairments and blindness encounter difficulties in moving
independently and overcoming various problems in their routine lives. As a solution, artificial …

被引用次数：73 相关文章所有 6 个版本

[PDF] arxiv.org

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org

Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

被引用次数：31 相关文章所有 2 个版本

[PDF] thecvf.com

Grounding answers for visual questions asked by visually impaired people

C Chen, S Anjum, D Gurari - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com

Visual question answering is the task of answering questions about images. We introduce
the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual …

被引用次数：55 相关文章所有 6 个版本

[PDF] acm.org

" I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models

V Gadiraju, S Kane, S Dev, A Taylor, D Wang… - Proceedings of the …, 2023 - dl.acm.org

Large language models (LLMs) trained on real-world data can inadvertently reflect harmful
societal biases, particularly toward historically marginalized communities. While previous …

被引用次数：24 相关文章所有 5 个版本

[PDF] usenix.org

" I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications

T Akter, B Dosono, T Ahmed, A Kapadia… - 29th USENIX Security …, 2020 - usenix.org

The emergence of camera-based assistive technologies has empowered people with visual
impairments (VIP) to obtain independence in their daily lives. Popular services feature …

被引用次数：88 相关文章所有 8 个版本

[PDF] thecvf.com

Benchmark platform for ultra-fine-grained visual categorization beyond human performance

X Yu, Y Zhao, Y Gao, X Yuan… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Deep learning methods have achieved remarkable success in fine-grained visual
categorization. Such successful categorization at sub-ordinate level, eg, different animal or …

被引用次数：32 相关文章所有 4 个版本

[PDF] thecvf.com

Story visualization by online text augmentation with context memory

D Ahn, D Kim, G Song, SH Kim, H Lee… - Proceedings of the …, 2023 - openaccess.thecvf.com

Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not
only rendering visual details from the text descriptions but also encoding a longterm context …

被引用次数：6 相关文章所有 7 个版本