Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

Reinforcement learning for fine-tuning text-to-image diffusion models

Y Fan, O Watkins, Y Du, H Liu, M Ryu… - Advances in …, 2024 - proceedings.neurips.cc
Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …

Aligning text-to-image models using human feedback

K Lee, H Liu, M Ryu, O Watkins, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org
Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

Textdiffuser-2: Unleashing the power of language models for text rendering

J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei - European Conference on …, 2025 - Springer
The diffusion model has been proven a powerful generative model in recent years, yet it
remains a challenge in generating visual text. Although existing work has endeavored to …

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation

J Cho, Y Hu, R Garg, P Anderson, R Krishna… - arXiv preprint arXiv …, 2023 - arxiv.org
Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …

Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback

J Singh, L Zheng - Advances in Neural Information …, 2023 - proceedings.neurips.cc
The field of text-conditioned image generation has made unparalleled progress with the
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …

Glyph-byt5: A customized text encoder for accurate visual text rendering

Z Liu, W Liang, Z Liang, C Luo, J Li, G Huang… - … on Computer Vision, 2025 - Springer
Visual text rendering poses a fundamental challenge for contemporary text-to-image
generation models, with the core problem lying in text encoder deficiencies. To achieve …

Glyphcontrol: Glyph conditional control for visual text generation

Y Yang, D Gui, Y Yuan, W Liang… - Advances in …, 2024 - proceedings.neurips.cc
Recently, there has been an increasing interest in developing diffusion-based text-to-image
generative models capable of generating coherent and well-formed visual text. In this paper …