Vision-by-language for training-free compositional image retrieval

S Karthik, K Roth, M Mancini, Z Akata - arXiv preprint arXiv:2310.09291, 2023 - arxiv.org
Given an image and a target modification (eg an image of the Eiffel tower and the text"
without people and at night-time"), Compositional Image Retrieval (CIR) aims to retrieve the …

Dreamsync: Aligning text-to-image generation with image understanding feedback

J Sun, D Fu, Y Hu, S Wang, R Rassin… - Synthetic Data for …, 2023 - openreview.net
Despite their widespread success, Text-to-Image models (T2I) still struggle to produce
images that are both aesthetically pleasing and faithful to the user's input text. We introduce …

Not all noises are created equally: Diffusion noise selection and optimization

Z Qi, L Bai, H Xiong, Z Xie - arXiv preprint arXiv:2407.14041, 2024 - arxiv.org
Diffusion models that can generate high-quality data from randomly sampled Gaussian
noises have become the mainstream generative method in both academia and industry. Are …

Improving text-to-image consistency via automatic prompt optimization

O Mañas, P Astolfi, M Hall, C Ross, J Urbanek… - arXiv preprint arXiv …, 2024 - arxiv.org
Impressive advances in text-to-image (T2I) generative models have yielded a plethora of
high performing models which are able to generate aesthetically appealing, photorealistic …

Versat2i: Improving text-to-image models with versatile reward

J Guo, W Chai, J Deng, HW Huang, T Ye, Y Xu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data,
demonstrating impressive performance. However, these T2I models still struggle to produce …

Removing Distributional Discrepancies in Captions Improves Image-Text Alignment

Y Li, H Liu, M Cai, Y Li, E Shechtman, Z Lin… - … on Computer Vision, 2025 - Springer
In this paper, we introduce a model designed to improve the prediction of image-text
alignment, targeting the challenge of compositional understanding in current visual …

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

L Eyring, S Karthik, K Roth, A Dosovitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …

Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation

M Zameshina, O Teytaud, L Najman - arXiv preprint arXiv:2310.12583, 2023 - arxiv.org
Latent diffusion models excel at producing high-quality images from text. Yet, concerns
appear about the lack of diversity in the generated imagery. To tackle this, we introduce …

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

T Hu, L Li, J van de Weijer, H Gao, FS Khan… - arXiv preprint arXiv …, 2024 - arxiv.org
Although text-to-image (T2I) models exhibit remarkable generation capabilities, they
frequently fail to accurately bind semantically related objects or attributes in the input …

Consistency-diversity-realism Pareto fronts of conditional image generative models

P Astolfi, M Careil, M Hall, O Mañas, M Muckley… - arXiv preprint arXiv …, 2024 - arxiv.org
Building world models that accurately and comprehensively represent the real world is the
utmost aspiration for conditional image generative models as it would enable their use as …