Privacy–enhancing face biometrics: A comprehensive survey
Biometric recognition technology has made significant advances over the last decade and is
now used across a number of services and applications. However, this widespread …
now used across a number of services and applications. However, this widespread …
From image to language: A critical analysis of visual question answering (vqa) approaches, challenges, and opportunities
The multimodal task of Visual Question Answering (VQA) encompassing elements of
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers …
Captioning images taken by people who are blind
While an important problem in the vision community is to design algorithms that can
automatically caption images, few publicly-available datasets for algorithm development …
automatically caption images, few publicly-available datasets for algorithm development …
[HTML][HTML] Smart glass system using deep learning for the blind and visually impaired
M Mukhiddinov, J Cho - Electronics, 2021 - mdpi.com
Individuals suffering from visual impairments and blindness encounter difficulties in moving
independently and overcoming various problems in their routine lives. As a solution, artificial …
independently and overcoming various problems in their routine lives. As a solution, artificial …
Negative object presence evaluation (nope) to measure object hallucination in vision-language models
Object hallucination poses a significant challenge in vision-language (VL) models, often
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
leading to the generation of nonsensical or unfaithful responses with non-existent objects …
Grounding answers for visual questions asked by visually impaired people
Visual question answering is the task of answering questions about images. We introduce
the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual …
the VizWiz-VQA-Grounding dataset, the first dataset that visually grounds answers to visual …
" I wouldn't say offensive but...": Disability-Centered Perspectives on Large Language Models
Large language models (LLMs) trained on real-world data can inadvertently reflect harmful
societal biases, particularly toward historically marginalized communities. While previous …
societal biases, particularly toward historically marginalized communities. While previous …
" I am uncomfortable sharing what I can't see": Privacy Concerns of the Visually Impaired with Camera Based Assistive Applications
The emergence of camera-based assistive technologies has empowered people with visual
impairments (VIP) to obtain independence in their daily lives. Popular services feature …
impairments (VIP) to obtain independence in their daily lives. Popular services feature …
Benchmark platform for ultra-fine-grained visual categorization beyond human performance
Deep learning methods have achieved remarkable success in fine-grained visual
categorization. Such successful categorization at sub-ordinate level, eg, different animal or …
categorization. Such successful categorization at sub-ordinate level, eg, different animal or …
Story visualization by online text augmentation with context memory
Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not
only rendering visual details from the text descriptions but also encoding a longterm context …
only rendering visual details from the text descriptions but also encoding a longterm context …