Vision-language pre-training: Basics, recent advances, and future trends
This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …
intelligence that have been developed in the last few years. We group these approaches …
Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text
S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …
but improved evaluation approaches are rarely widely adopted. This issue has become …
Scaling instruction-finetuned language models
Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …
shown to improve model performance and generalization to unseen tasks. In this paper we …
[PDF][PDF] Scaling autoregressive models for content-rich text-to-image generation
Abstract We present the Pathways [1] Autoregressive Text-to-Image (Parti) model, which
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
generates high-fidelity photorealistic images and supports content-rich synthesis involving …
Pali: A jointly-scaled multilingual language-image model
Effective scaling and a flexible task interface enable large language models to excel at many
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
tasks. We present PaLI (Pathways Language and Image model), a model that extends this …
Photorealistic text-to-image diffusion models with deep language understanding
We present Imagen, a text-to-image diffusion model with an unprecedented degree of
photorealism and a deep level of language understanding. Imagen builds on the power of …
photorealism and a deep level of language understanding. Imagen builds on the power of …
Egoschema: A diagnostic benchmark for very long-form video language understanding
K Mangalam, R Akshulakov… - Advances in Neural …, 2024 - proceedings.neurips.cc
We introduce EgoSchema, a very long-form video question-answering dataset, and
benchmark to evaluate long video understanding capabilities of modern vision and …
benchmark to evaluate long video understanding capabilities of modern vision and …
Sugarcrepe: Fixing hackable benchmarks for vision-language compositionality
In the last year alone, a surge of new benchmarks to measure $\textit {compositional} $
understanding of vision-language models have permeated the machine learning ecosystem …
understanding of vision-language models have permeated the machine learning ecosystem …
Auditing large language models: a three-layered approach
Large language models (LLMs) represent a major advance in artificial intelligence (AI)
research. However, the widespread use of LLMs is also coupled with significant ethical and …
research. However, the widespread use of LLMs is also coupled with significant ethical and …
What is human-centered about human-centered AI? A map of the research landscape
T Capel, M Brereton - Proceedings of the 2023 CHI conference on …, 2023 - dl.acm.org
The application of Artificial Intelligence (AI) across a wide range of domains comes with both
high expectations of its benefits and dire predictions of misuse. While AI systems have …
high expectations of its benefits and dire predictions of misuse. While AI systems have …