Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

Grounded text-to-image synthesis with attention refocusing

Q Phung, S Ge, JB Huang - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Driven by the scalable diffusion models trained on large-scale datasets text-to-image
synthesis methods have shown compelling results. However these models still fail to …

Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation

Q Guo, T Lin - Proceedings of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recently diffusion-based methods like InstructPix2Pix (IP2P) have achieved effective
instruction-based image editing requiring only natural language instructions from the user …

Conform: Contrast is all you need for high-fidelity text-to-image diffusion models

THS Meral, E Simsar, F Tombari… - Proceedings of the …, 2024 - openaccess.thecvf.com
Images produced by text-to-image diffusion models might not always faithfully represent the
semantic intent of the provided text prompt where the model might overlook or entirely fail to …

Initno: Boosting text-to-image diffusion models via initial noise optimization

X Guo, J Liu, M Cui, J Li, H Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent strides in the development of diffusion models exemplified by advancements such as
Stable Diffusion have underscored their remarkable prowess in generating visually …

Not all noises are created equally: Diffusion noise selection and optimization

Z Qi, L Bai, H Xiong, Z Xie - arXiv preprint arXiv:2407.14041, 2024 - arxiv.org
Diffusion models that can generate high-quality data from randomly sampled Gaussian
noises have become the mainstream generative method in both academia and industry. Are …

Object-conditioned energy-based attention map alignment in text-to-image diffusion models

Y Zhang, P Yu, YN Wu - European Conference on Computer Vision, 2025 - Springer
Text-to-image diffusion models have shown great success in generating high-quality text-
guided images. Yet, these models may still fail to semantically align generated images with …

Improving compositional text-to-image generation with large vision-language models

S Wen, G Fang, R Zhang, P Gao, H Dong… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent advancements in text-to-image models, particularly diffusion models, have shown
significant promise. However, compositional text-to-image models frequently encounter …

MC: Multi-concept Guidance for Customized Multi-concept Generation

J Jiang, Y Zhang, K Feng, X Wu, W Zuo - arXiv preprint arXiv:2404.05268, 2024 - arxiv.org
Customized text-to-image generation aims to synthesize instantiations of user-specified
concepts and has achieved unprecedented progress in handling individual concept …

Enhancing semantic mapping in text-to-image diffusion via Gather-and-Bind

H Fu, G Cheng - Computers & Graphics, 2024 - Elsevier
Text-to-image synthesis is a challenging task that aims to generate realistic and diverse
images from natural language descriptions. However, existing text-to-image diffusion …