Inspecting the geographical representativeness of images from text-to-image models

A Basu, RV Babu, D Pruthi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Recent progress in generative models has resulted in models that produce both realistic as
well as relevant images for most textual inputs. These models are being used to generate …

Exploring the limitations in how ChatGPT introduces environmental justice issues in the United States: A case study of 3,108 counties

J Kim, J Lee, KM Jang, I Lourentzou - Telematics and Informatics, 2024 - Elsevier
The potential of Generative AI, such as ChatGPT, has sparked discussions among
researchers and the public. This study empirically explores the capabilities and limitations of …

Survey of cultural awareness in language models: Text and beyond

S Pawar, J Park, J Jin, A Arora, J Myung… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale deployment of large language models (LLMs) in various applications, such as
chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure …

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - Computers in Biology …, 2024 - Elsevier
With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition

K Buettner, S Malakouti, XL Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing object recognition models have been shown to lack robustness in diverse
geographical scenarios due to domain shifts in design and context. Class representations …

A survey on image-text multimodal models

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

Y Baek, CH Park, J Kim, YJ Heo, DS Chang… - arXiv preprint arXiv …, 2024 - arxiv.org
To create culturally inclusive vision-language models (VLMs), the foremost requirement is
developing a test benchmark that can diagnose the models' ability to respond to questions …

GD-COMET: A Geo-Diverse Commonsense Inference Model

M Bhatia, V Shwartz - arXiv preprint arXiv:2310.15383, 2023 - arxiv.org
With the increasing integration of AI into everyday life, it's becoming crucial to design AI
systems that serve users from diverse backgrounds by making them culturally aware. In this …

See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding

A Ananthram, E Stengel-Eskin, C Vondrick… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language models (VLMs) can respond to queries about images in many languages.
However, beyond language, culture affects how we see things. For example, individuals …

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor

J Chen, X Hei, Y Xue, Y Wei, J Xie, Y Cai… - Proceedings of the 32nd …, 2024 - dl.acm.org
Large multimodal models (LMMs) have shown remarkable performance in the visual
commonsense reasoning (VCR) task, which aims to answer a multiple-choice question …