Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Beyond efficiency: A systematic survey of resource-efficient large language models

G Bai, Z Chai, C Ling, S Wang, J Lu, N Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …

Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution

SM Hall, F Gonçalves Abrantes, H Zhu… - Advances in …, 2024 - proceedings.neurips.cc
We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language
models. We focus on occupation-related biases within a hegemonic system of binary …

Givl: Improving geographical inclusivity of vision-language models with pre-training methods

D Yin, F Gao, G Thattai, M Johnston… - Proceedings of the …, 2023 - openaccess.thecvf.com
A key goal for the advancement of AI is to develop technologies that serve the needs not just
of one group but of all communities regardless of their geographical region. In fact, a …

Cross-view language modeling: Towards unified cross-lingual cross-modal pre-training

Y Zeng, W Zhou, A Luo, Z Cheng, X Zhang - arXiv preprint arXiv …, 2022 - arxiv.org
In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-
training framework that unifies cross-lingual and cross-modal pre-training with shared …

Can Vision-Language Models Think from a First-Person Perspective?

S Cheng, Z Guo, J Wu, K Fang, P Li, H Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities, with the …

Write and paint: Generative vision-language models are unified modal learners

S Diao, W Zhou, X Zhang, J Wang - arXiv preprint arXiv:2206.07699, 2022 - arxiv.org
Recent advances in vision-language pre-training have pushed the state-of-the-art on
various vision-language tasks, making machines more capable of multi-modal writing …

Towards an Exhaustive Evaluation of Vision-Language Foundation Models

E Salin, S Ayache, B Favre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Vision-language foundation models have had considerable increase in performances in the
last few years. However, there is still a lack of comprehensive evaluation methods able to …

EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models

S Cheng, Z Guo, J Wu, K Fang, P Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities with the …

What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases

AMH Tiong, J Zhao, B Li, J Li, SCH Hoi… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language (VL) models, pretrained on colossal image-text datasets, have attained
broad VL competence that is difficult to evaluate. A common belief is that a small number of …