Givl: Improving geographical inclusivity of vision-language models with pre-training methods

A Basu, RV Babu, D Pruthi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Recent progress in generative models has resulted in models that produce both realistic as
well as relevant images for most textual inputs. These models are being used to generate …

被引用次数：28 相关文章所有 6 个版本

Exploring the limitations in how ChatGPT introduces environmental justice issues in the United States: A case study of 3,108 counties

J Kim, J Lee, KM Jang, I Lourentzou - Telematics and Informatics, 2024 - Elsevier

The potential of Generative AI, such as ChatGPT, has sparked discussions among
researchers and the public. This study empirically explores the capabilities and limitations of …

被引用次数：18 相关文章所有 4 个版本

[PDF] arxiv.org

Survey of cultural awareness in language models: Text and beyond

S Pawar, J Park, J Jin, A Arora, J Myung… - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale deployment of large language models (LLMs) in various applications, such as
chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure …

被引用次数：3 相关文章所有 4 个版本

A survey on advancements in image-text multimodal models: From general techniques to biomedical implementations

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - Computers in Biology …, 2024 - Elsevier

With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition

K Buettner, S Malakouti, XL Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Existing object recognition models have been shown to lack robustness in diverse
geographical scenarios due to domain shifts in design and context. Class representations …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on image-text multimodal models

R Guo, J Wei, L Sun, B Yu, G Chang, D Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

With the significant advancements of Large Language Models (LLMs) in the field of Natural
Language Processing (NLP), the development of image-text multimodal models has …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration

Y Baek, CH Park, J Kim, YJ Heo, DS Chang… - arXiv preprint arXiv …, 2024 - arxiv.org

To create culturally inclusive vision-language models (VLMs), the foremost requirement is
developing a test benchmark that can diagnose the models' ability to respond to questions …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

GD-COMET: A Geo-Diverse Commonsense Inference Model

M Bhatia, V Shwartz - arXiv preprint arXiv:2310.15383, 2023 - arxiv.org

With the increasing integration of AI into everyday life, it's becoming crucial to design AI
systems that serve users from diverse backgrounds by making them culturally aware. In this …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding

A Ananthram, E Stengel-Eskin, C Vondrick… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-language models (VLMs) can respond to queries about images in many languages.
However, beyond language, culture affects how we see things. For example, individuals …

被引用次数：5 相关文章

[PDF] arxiv.org

Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning Distractor

J Chen, X Hei, Y Xue, Y Wei, J Xie, Y Cai… - Proceedings of the 32nd …, 2024 - dl.acm.org

Large multimodal models (LMMs) have shown remarkable performance in the visual
commonsense reasoning (VCR) task, which aims to answer a multiple-choice question …