Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering
Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …
to-image generation models, systems often fail to produce images that accurately align with …
Reinforcement learning for fine-tuning text-to-image diffusion models
Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …
techniques first learn a reward function that captures what humans care about in the task …
Aligning text-to-image models using human feedback
Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …
However, current text-to-image models often generate images that are inadequately aligned …
Grounded text-to-image synthesis with attention refocusing
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …
synthesis methods have shown compelling results. However these models still fail to …
Textdiffuser-2: Unleashing the power of language models for text rendering
The diffusion model has been proven a powerful generative model in recent years, yet it
remains a challenge in generating visual text. Although existing work has endeavored to …
remains a challenge in generating visual text. Although existing work has endeavored to …
Textdiffuser: Diffusion models as text painters
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …
but currently struggle with rendering accurate and coherent text. To address this issue, we …
Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation
Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …
assessing text-image faithfulness is based on QG/A (question generation and answering) …
Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback
The field of text-conditioned image generation has made unparalleled progress with the
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …
Glyph-byt5: A customized text encoder for accurate visual text rendering
Visual text rendering poses a fundamental challenge for contemporary text-to-image
generation models, with the core problem lying in text encoder deficiencies. To achieve …
generation models, with the core problem lying in text encoder deficiencies. To achieve …
Glyphcontrol: Glyph conditional control for visual text generation
Recently, there has been an increasing interest in developing diffusion-based text-to-image
generative models capable of generating coherent and well-formed visual text. In this paper …
generative models capable of generating coherent and well-formed visual text. In this paper …