Controllable generation with text-to-image diffusion models: A survey

P Cao, F Zhou, Q Song, L Yang - arXiv preprint arXiv:2403.04279, 2024 - arxiv.org
In the rapidly advancing realm of visual generation, diffusion models have revolutionized the
landscape, marking a significant shift in capabilities with their impressive text-guided …

Towards diverse and consistent typography generation

W Shimoda, D Haraguchi, S Uchida… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this work, we consider the typography generation task that aims at producing diverse
typographic styling for the given graphic document. We formulate typography generation as …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing

Y Shu, W Zeng, Z Li, F Zhao, Y Zhou - arXiv preprint arXiv:2402.03082, 2024 - arxiv.org
Visual text, a pivotal element in both document and scene images, speaks volumes and
attracts significant attention in the computer vision domain. Beyond visual text detection and …

Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation

S Lakhanpal, S Chopra, V Jain, A Chadha… - arXiv preprint arXiv …, 2024 - arxiv.org
Over the past few years, Text-to-Image (T2I) generation approaches based on diffusion
models have gained significant attention. However, vanilla diffusion models often suffer from …

ARTIST: Improving the Generation of Text-rich Images by Disentanglement

J Zhang, Y Zhou, J Gu, C Wigington, T Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have demonstrated exceptional capabilities in generating a broad
spectrum of visual content, yet their proficiency in rendering text is still limited: they often …

Text-Animator: Controllable Visual Text Video Generation

L Liu, Q Liu, S Qian, Y Zhou, W Zhou, H Li, L Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Video generation is a challenging yet pivotal task in various industries, such as gaming, e-
commerce, and advertising. One significant unresolved aspect within T2V is the effective …

Prompt-Consistency Image Generation (PCIG): A Unified Framework Integrating LLMs, Knowledge Graphs, and Controllable Diffusion Models

Y Sun, Z Chu, Z Qin, K Ren - arXiv preprint arXiv:2406.16333, 2024 - arxiv.org
The rapid advancement of Text-to-Image (T2I) generative models has enabled the synthesis
of high-quality images guided by textual descriptions. Despite this significant progress, these …

Kinetic Typography Diffusion Model

S Park, I Bae, S Shin, HG Jeon - arXiv preprint arXiv:2407.10476, 2024 - arxiv.org
This paper introduces a method for realistic kinetic typography that generates user-preferred
animatable'text content'. We draw on recent advances in guided video diffusion models to …

GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

J Ma, Y Deng, C Chen, H Lu, Z Yang - arXiv preprint arXiv:2407.02252, 2024 - arxiv.org
Posters play a crucial role in marketing and advertising, contributing significantly to industrial
design by enhancing visual communication and brand visibility. With recent advances in …