Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …
Grounded text-to-image synthesis with attention refocusing
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …
synthesis methods have shown compelling results. However these models still fail to …
Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation
Q Guo, T Lin - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion-based methods like InstructPix2Pix (IP2P) have achieved effective
instruction-based image editing requiring only natural language instructions from the user …
instruction-based image editing requiring only natural language instructions from the user …
Conform: Contrast is all you need for high-fidelity text-to-image diffusion models
Images produced by text-to-image diffusion models might not always faithfully represent the
semantic intent of the provided text prompt where the model might overlook or entirely fail to …
semantic intent of the provided text prompt where the model might overlook or entirely fail to …
Initno: Boosting text-to-image diffusion models via initial noise optimization
Recent strides in the development of diffusion models exemplified by advancements such as
Stable Diffusion have underscored their remarkable prowess in generating visually …
Stable Diffusion have underscored their remarkable prowess in generating visually …
Not all noises are created equally: Diffusion noise selection and optimization
Diffusion models that can generate high-quality data from randomly sampled Gaussian
noises have become the mainstream generative method in both academia and industry. Are …
noises have become the mainstream generative method in both academia and industry. Are …
Object-conditioned energy-based attention map alignment in text-to-image diffusion models
Text-to-image diffusion models have shown great success in generating high-quality text-
guided images. Yet, these models may still fail to semantically align generated images with …
guided images. Yet, these models may still fail to semantically align generated images with …
Improving compositional text-to-image generation with large vision-language models
Recent advancements in text-to-image models, particularly diffusion models, have shown
significant promise. However, compositional text-to-image models frequently encounter …
significant promise. However, compositional text-to-image models frequently encounter …
MC: Multi-concept Guidance for Customized Multi-concept Generation
Customized text-to-image generation aims to synthesize instantiations of user-specified
concepts and has achieved unprecedented progress in handling individual concept …
concepts and has achieved unprecedented progress in handling individual concept …
Enhancing semantic mapping in text-to-image diffusion via Gather-and-Bind
H Fu, G Cheng - Computers & Graphics, 2024 - Elsevier
Text-to-image synthesis is a challenging task that aims to generate realistic and diverse
images from natural language descriptions. However, existing text-to-image diffusion …
images from natural language descriptions. However, existing text-to-image diffusion …