Vlue: A multi-task benchmark for evaluating vision-language models

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：192 相关文章所有 7 个版本

[PDF] arxiv.org

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

被引用次数：63 相关文章所有 2 个版本

[PDF] neurips.cc

Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution

SM Hall, F Gonçalves Abrantes, H Zhu… - Advances in …, 2024 - proceedings.neurips.cc

We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language
models. We focus on occupation-related biases within a hegemonic system of binary …

被引用次数：24 相关文章所有 5 个版本

[PDF] thecvf.com

Givl: Improving geographical inclusivity of vision-language models with pre-training methods

D Yin, F Gao, G Thattai, M Johnston… - Proceedings of the …, 2023 - openaccess.thecvf.com

A key goal for the advancement of AI is to develop technologies that serve the needs not just
of one group but of all communities regardless of their geographical region. In fact, a …

被引用次数：16 相关文章所有 7 个版本

[PDF] arxiv.org

Cross-view language modeling: Towards unified cross-lingual cross-modal pre-training

Y Zeng, W Zhou, A Luo, Z Cheng, X Zhang - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-
training framework that unifies cross-lingual and cross-modal pre-training with shared …

被引用次数：31 相关文章所有 5 个版本

[PDF] arxiv.org

Can Vision-Language Models Think from a First-Person Perspective?

S Cheng, Z Guo, J Wu, K Fang, P Li, H Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities, with the …

被引用次数：10 相关文章所有 2 个版本

[PDF] arxiv.org

Write and paint: Generative vision-language models are unified modal learners

S Diao, W Zhou, X Zhang, J Wang - arXiv preprint arXiv:2206.07699, 2022 - arxiv.org

Recent advances in vision-language pre-training have pushed the state-of-the-art on
various vision-language tasks, making machines more capable of multi-modal writing …

被引用次数：19 相关文章所有 2 个版本

[PDF] thecvf.com

Towards an Exhaustive Evaluation of Vision-Language Foundation Models

E Salin, S Ayache, B Favre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Vision-language foundation models have had considerable increase in performances in the
last few years. However, there is still a lack of comprehensive evaluation methods able to …

被引用次数：5 相关文章所有 6 个版本

[PDF] thecvf.com

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

S Cheng, Z Guo, J Wu, K Fang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities with the …

被引用次数：7 相关文章

[PDF] arxiv.org

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

AMH Tiong, J Zhao, B Li, J Li, SCH Hoi… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-language (VL) models, pretrained on colossal image-text datasets, have attained
broad VL competence that is difficult to evaluate. A common belief is that a small number of …

被引用次数：5 相关文章所有 4 个版本