Emu video: Factorizing text-to-video generation by explicit image conditioning

R Girdhar, M Singh, A Brown, Q Duval, S Azadi… - arXiv preprint arXiv …, 2023 - arxiv.org
We present Emu Video, a text-to-video generation model that factorizes the generation into
two steps: first generating an image conditioned on the text, and then generating a video …

Controlstyle: Text-driven stylized image generation using diffusion priors

J Chen, Y Pan, T Yao, T Mei - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
Recently, the multimedia community has witnessed the rise of diffusion models trained on
large-scale multi-modal data for visual content creation, particularly in the field of text-to …

Chatpainter: Improving text to image generation using dialogue

S Sharma, D Suhubdy, V Michalski, SE Kahou… - arXiv preprint arXiv …, 2018 - arxiv.org
Synthesizing realistic images from text descriptions on a dataset like Microsoft Common
Objects in Context (MS COCO), where each image can contain several objects, is a …

Textdiffuser-2: Unleashing the power of language models for text rendering

J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei - arXiv preprint arXiv …, 2023 - arxiv.org
The diffusion model has been proven a powerful generative model in recent years, yet
remains a challenge in generating visual text. Several methods alleviated this issue by …

Enhancing detail preservation for customized text-to-image generation: A regularization-free approach

Y Zhou, R Zhang, T Sun, J Xu - arXiv preprint arXiv:2305.13579, 2023 - arxiv.org
Recent text-to-image generation models have demonstrated impressive capability of
generating text-aligned images with high fidelity. However, generating images of novel …

Prompt-to-prompt image editing with cross attention control

A Hertz, R Mokady, J Tenenbaum, K Aberman… - arXiv preprint arXiv …, 2022 - arxiv.org
Recent large-scale text-driven synthesis models have attracted much attention thanks to
their remarkable capabilities of generating highly diverse images that follow given text …

Ufogen: You forward once large scale text-to-image generation via diffusion gans

Y Xu, Y Zhao, Z Xiao, T Hou - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Text-to-image diffusion models have demonstrated remarkable capabilities in transforming
text prompts into coherent images yet the computational cost of the multi-step inference …

Stylegan-nada: Clip-guided domain adaptation of image generators

R Gal, O Patashnik, H Maron, AH Bermano… - ACM Transactions on …, 2022 - dl.acm.org
Can a generative model be trained to produce images from a specific domain, guided only
by a text prompt, without seeing any image? In other words: can an image generator be …

Pick-a-pic: An open dataset of user preferences for text-to-image generation

Y Kirstain, A Polyak, U Singer… - Advances in …, 2023 - proceedings.neurips.cc
The ability to collect a large dataset of human preferences from text-to-image users is
usually limited to companies, making such datasets inaccessible to the public. To address …

GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning

J Lv, Y Huang, M Yan, J Huang, J Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in text-to-video generation have harnessed the power of diffusion models
to create visually compelling content conditioned on text prompts. However they usually …