Gligen: Open-set grounded text-to-image generation

Y Li, H Liu, Q Wu, F Mu, J Yang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large-scale text-to-image diffusion models have made amazing advances. However, the
status quo is to use text input alone, which can impede controllability. In this work, we …

Latte: Latent diffusion transformer for video generation

X Ma, Y Wang, G Jia, X Chen, Z Liu, YF Li… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte
first extracts spatio-temporal tokens from input videos and then adopts a series of …

Enhancing detail preservation for customized text-to-image generation: A regularization-free approach

Y Zhou, R Zhang, T Sun, J Xu - arXiv preprint arXiv:2305.13579, 2023 - arxiv.org
Recent text-to-image generation models have demonstrated impressive capability of
generating text-aligned images with high fidelity. However, generating images of novel …

Parrot: Pareto-optimal multi-reward reinforcement learning framework for text-to-image generation

SH Lee, Y Li, J Ke, I Yoo, H Zhang, J Yu… - … on Computer Vision, 2025 - Springer
Recent works have demonstrated that using reinforcement learning (RL) with multiple
quality rewards can improve the quality of generated images in text-to-image (T2I) …

Renaissance: A survey into ai text-to-image generation in the era of large model

F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Text-to-image generation (TTI) refers to the usage of models that could process text input
and generate high fidelity images based on text descriptions. Text-to-image generation …

Exploiting the signal-leak bias in diffusion models

MN Everaert, A Fitsios, M Bocchio… - Proceedings of the …, 2024 - openaccess.thecvf.com
There is a bias in the inference pipeline of most diffusion models. This bias arises from a
signal leak whose distribution deviates from the noise distribution, creating a discrepancy …

Conceptlab: Creative generation using diffusion prior constraints

E Richardson, K Goldberg, Y Alaluf… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent text-to-image generative models have enabled us to transform our words into
vibrant, captivating imagery. The surge of personalization techniques that has followed has …

SkipDiff: Adaptive Skip Diffusion Model for High-Fidelity Perceptual Image Super-resolution

X Luo, Y Xie, Y Qu, Y Fu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
It is well-known that image quality assessment usually meets with the problem of perception-
distortion (pd) tradeoff. The existing deep image super-resolution (SR) methods either focus …

Texsliders: Diffusion-based texture editing in clip space

J Guerrero-Viu, M Hasan, A Roullier… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Generative models have enabled intuitive image creation and manipulation using natural
language. In particular, diffusion models have recently shown remarkable results for natural …

A survey of diffusion based image generation models: Issues and their solutions

T Zhang, Z Wang, J Huang, MM Tasnim… - arXiv preprint arXiv …, 2023 - arxiv.org
Recently, there has been significant progress in the development of large models. Following
the success of ChatGPT, numerous language models have been introduced, demonstrating …