Privacy–enhancing face biometrics: A comprehensive survey

B Meden, P Rot, P Terhörst, N Damer… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Biometric recognition technology has made significant advances over the last decade and is
now used across a number of services and applications. However, this widespread …

From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities

MF Ishmam, MSH Shovon, MF Mridha, N Dey - Information Fusion, 2024 - Elsevier
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …

Captioning images taken by people who are blind

D Gurari, Y Zhao, M Zhang, N Bhattacharya - Computer Vision–ECCV …, 2020 - Springer
While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …

[HTML][HTML] Smart glass system using deep learning for the blind and visually impaired

M Mukhiddinov, J Cho - Electronics, 2021 - mdpi.com
Individuals suffering from visual impairments and blindness encounter difficulties in moving
independently and overcoming various problems in their routine lives. As a solution, artificial …

Negative object presence evaluation (nope) to measure object hallucination in vision-language models

H Lovenia, W Dai, S Cahyawijaya, Z Ji… - arXiv preprint arXiv …, 2023 - arxiv.org
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …

Grounding answers for visual questions asked by visually impaired people

C Chen, S Anjum, D Gurari - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Visual question answering is the task of answering questions about images. We introduce
the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual …

" I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models

V Gadiraju, S Kane, S Dev, A Taylor, D Wang… - Proceedings of the …, 2023 - dl.acm.org
Large language models (LLMs) trained on real-world data can inadvertently reflect harmful
societal biases, particularly toward historically marginalized communities. While previous …

" I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications

T Akter, B Dosono, T Ahmed, A Kapadia… - 29th USENIX Security …, 2020 - usenix.org
The emergence of camera-based assistive technologies has empowered people with visual
impairments (VIP) to obtain independence in their daily lives. Popular services feature …

Benchmark platform for ultra-fine-grained visual categorization beyond human performance

X Yu, Y Zhao, Y Gao, X Yuan… - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Deep learning methods have achieved remarkable success in fine-grained visual
categorization. Such successful categorization at sub-ordinate level, eg, different animal or …

Story visualization by online text augmentation with context memory

D Ahn, D Kim, G Song, SH Kim, H Lee… - Proceedings of the …, 2023 - openaccess.thecvf.com
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not
only rendering visual details from the text descriptions but also encoding a longterm context …