Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Beyond efficiency: A systematic survey of resource-efficient large language models
The burgeoning field of Large Language Models (LLMs), exemplified by sophisticated
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
models like OpenAI's ChatGPT, represents a significant advancement in artificial …
Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution
We introduce VisoGender, a novel dataset for benchmarking gender bias in vision-language
models. We focus on occupation-related biases within a hegemonic system of binary …
models. We focus on occupation-related biases within a hegemonic system of binary …
Givl: Improving geographical inclusivity of vision-language models with pre-training methods
A key goal for the advancement of AI is to develop technologies that serve the needs not just
of one group but of all communities regardless of their geographical region. In fact, a …
of one group but of all communities regardless of their geographical region. In fact, a …
Cross-view language modeling: Towards unified cross-lingual cross-modal pre-training
In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-
training framework that unifies cross-lingual and cross-modal pre-training with shared …
training framework that unifies cross-lingual and cross-modal pre-training with shared …
Can Vision-Language Models Think from a First-Person Perspective?
Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities, with the …
downstream tasks. Evaluation studies have emerged to assess their abilities, with the …
Write and paint: Generative vision-language models are unified modal learners
Recent advances in vision-language pre-training have pushed the state-of-the-art on
various vision-language tasks, making machines more capable of multi-modal writing …
various vision-language tasks, making machines more capable of multi-modal writing …
Towards an Exhaustive Evaluation of Vision-Language Foundation Models
Vision-language foundation models have had considerable increase in performances in the
last few years. However, there is still a lack of comprehensive evaluation methods able to …
last few years. However, there is still a lack of comprehensive evaluation methods able to …
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Vision-language models (VLMs) have recently shown promising results in traditional
downstream tasks. Evaluation studies have emerged to assess their abilities with the …
downstream tasks. Evaluation studies have emerged to assess their abilities with the …
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases
Vision-language (VL) models, pretrained on colossal image-text datasets, have attained
broad VL competence that is difficult to evaluate. A common belief is that a small number of …
broad VL competence that is difficult to evaluate. A common belief is that a small number of …