Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

在引用文章中搜索

[PDF] arxiv.org

Arondight: Red teaming large vision language models with auto-generated multi-modal jailbreak prompts

Y Liu, C Cai, X Zhang, X Yuan, C Wang - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Large Vision Language Models (VLMs) extend and enhance the perceptual abilities of
Large Language Models (LLMs). Despite offering new possibilities for LLM applications …

被引用次数：6 相关文章所有 5 个版本

[PDF] arxiv.org

Survey of Cultural Awareness in Language Models: Text and Beyond

S Pawar, J Park, J Jin, A Arora, J Myung… - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale deployment of large language models (LLMs) in various applications, such as
chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

FoodieQA: A Multimodal Dataset for Fine-Grained Understanding of Chinese Food Culture

W Li, X Zhang, J Li, Q Peng, R Tang, L Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Food is a rich and varied dimension of cultural heritage, crucial to both individuals and
social groups. To bridge the gap in the literature on the often-overlooked regional diversity in …

被引用次数：4 相关文章

[PDF] arxiv.org

Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts

EZ Zeng, Y Chen, A Wong - arXiv preprint arXiv:2410.21314, 2024 - arxiv.org

Recent advances in image generation have made diffusion models powerful tools for
creating high-quality images. However, their iterative denoising process makes …