Vision-by-language for training-free compositional image retrieval
Given an image and a target modification (eg an image of the Eiffel tower and the text"
without people and at night-time"), Compositional Image Retrieval (CIR) aims to retrieve the …
without people and at night-time"), Compositional Image Retrieval (CIR) aims to retrieve the …
Dreamsync: Aligning text-to-image generation with image understanding feedback
Despite their widespread success, Text-to-Image models (T2I) still struggle to produce
images that are both aesthetically pleasing and faithful to the user's input text. We introduce …
images that are both aesthetically pleasing and faithful to the user's input text. We introduce …
Not all noises are created equally: Diffusion noise selection and optimization
Diffusion models that can generate high-quality data from randomly sampled Gaussian
noises have become the mainstream generative method in both academia and industry. Are …
noises have become the mainstream generative method in both academia and industry. Are …
Improving text-to-image consistency via automatic prompt optimization
Impressive advances in text-to-image (T2I) generative models have yielded a plethora of
high performing models which are able to generate aesthetically appealing, photorealistic …
high performing models which are able to generate aesthetically appealing, photorealistic …
Versat2i: Improving text-to-image models with versatile reward
Recent text-to-image (T2I) models have benefited from large-scale and high-quality data,
demonstrating impressive performance. However, these T2I models still struggle to produce …
demonstrating impressive performance. However, these T2I models still struggle to produce …
Removing Distributional Discrepancies in Captions Improves Image-Text Alignment
In this paper, we introduce a model designed to improve the prediction of image-text
alignment, targeting the challenge of compositional understanding in current visual …
alignment, targeting the challenge of compositional understanding in current visual …
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization
Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …
still struggle to accurately capture intricate details specified in complex compositional …
Diverse Diffusion: Enhancing Image Diversity in Text-to-Image Generation
Latent diffusion models excel at producing high-quality images from text. Yet, concerns
appear about the lack of diversity in the generated imagery. To tackle this, we introduce …
appear about the lack of diversity in the generated imagery. To tackle this, we introduce …
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
Although text-to-image (T2I) models exhibit remarkable generation capabilities, they
frequently fail to accurately bind semantically related objects or attributes in the input …
frequently fail to accurately bind semantically related objects or attributes in the input …
Consistency-diversity-realism Pareto fronts of conditional image generative models
Building world models that accurately and comprehensively represent the real world is the
utmost aspiration for conditional image generative models as it would enable their use as …
utmost aspiration for conditional image generative models as it would enable their use as …