HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

X Liu, Y He, L Guo, X Li, B Jin, P Li, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The potential for higher-resolution image generation using pretrained diffusion models is
immense, yet these models often struggle with issues of object repetition and structural …

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

P Gao, L Zhuo, Z Lin, C Liu, J Chen, R Du, E Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic
images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks …

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

H Wu, S Shen, Q Hu, X Zhang, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models have emerged as frontrunners in text-to-image generation for their
impressive capabilities. Nonetheless, their fixed image resolution during training often leads …

DiffuseHigh: Training-free Progressive High-Resolution Image Synthesis through Structure Guidance

Y Kim, G Hwang, E Park - arXiv preprint arXiv:2406.18459, 2024 - arxiv.org
Recent surge in large-scale generative models has spurred the development of vast fields in
computer vision. In particular, text-to-image diffusion models have garnered widespread …