Character-aware models improve visual text rendering

Y Hu, B Liu, J Kasai, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Despite thousands of researchers, engineers, and artists actively working on improving text-
to-image generation models, systems often fail to produce images that accurately align with …

被引用次数：152 相关文章所有 5 个版本

[PDF] neurips.cc

Reinforcement learning for fine-tuning text-to-image diffusion models

Y Fan, O Watkins, Y Du, H Liu, M Ryu… - Advances in …, 2024 - proceedings.neurips.cc

Learning from human feedback has been shown to improve text-to-image models. These
techniques first learn a reward function that captures what humans care about in the task …

被引用次数：158 相关文章所有 7 个版本

[PDF] arxiv.org

Aligning text-to-image models using human feedback

K Lee, H Liu, M Ryu, O Watkins, Y Du… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep generative models have shown impressive results in text-to-image synthesis.
However, current text-to-image models often generate images that are inadequately aligned …

被引用次数：214 相关文章所有 2 个版本

[PDF] thecvf.com

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

被引用次数：81 相关文章所有 3 个版本

[PDF] arxiv.org

Textdiffuser-2: Unleashing the power of language models for text rendering

J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei - European Conference on …, 2025 - Springer

The diffusion model has been proven a powerful generative model in recent years, yet it
remains a challenge in generating visual text. Although existing work has endeavored to …

被引用次数：39 相关文章所有 2 个版本

[PDF] neurips.cc

Textdiffuser: Diffusion models as text painters

J Chen, Y Huang, T Lv, L Cui… - Advances in Neural …, 2024 - proceedings.neurips.cc

Diffusion models have gained increasing attention for their impressive generation abilities
but currently struggle with rendering accurate and coherent text. To address this issue, we …

被引用次数：91 相关文章所有 5 个版本

[PDF] arxiv.org

Davidsonian scene graph: Improving reliability in fine-grained evaluation for text-image generation

J Cho, Y Hu, R Garg, P Anderson, R Krishna… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating text-to-image models is notoriously difficult. A strong recent approach for
assessing text-image faithfulness is based on QG/A (question generation and answering) …

被引用次数：60 相关文章所有 3 个版本

[PDF] neurips.cc

Divide, evaluate, and refine: Evaluating and improving text-to-image alignment with iterative vqa feedback

J Singh, L Zheng - Advances in Neural Information …, 2023 - proceedings.neurips.cc

The field of text-conditioned image generation has made unparalleled progress with the
recent advent of latent diffusion models. While revolutionary, as the complexity of given text …

被引用次数：15 相关文章所有 5 个版本

Glyph-byt5: A customized text encoder for accurate visual text rendering

Z Liu, W Liang, Z Liang, C Luo, J Li, G Huang… - … on Computer Vision, 2025 - Springer

Visual text rendering poses a fundamental challenge for contemporary text-to-image
generation models, with the core problem lying in text encoder deficiencies. To achieve …

被引用次数：13 相关文章所有 2 个版本

[PDF] neurips.cc

Glyphcontrol: Glyph conditional control for visual text generation

Y Yang, D Gui, Y Yuan, W Liang… - Advances in …, 2024 - proceedings.neurips.cc

Recently, there has been an increasing interest in developing diffusion-based text-to-image
generative models capable of generating coherent and well-formed visual text. In this paper …

被引用次数：45 相关文章所有 6 个版本