Svdiff: Compact parameter space for diffusion fine-tuning

L Han, Y Li, H Zhang, P Milanfar… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, diffusion models have achieved remarkable success in text-to-image generation,
enabling the creation of high-quality images from text prompts and various conditions …

Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Freeu: Free lunch in diffusion u-net

C Si, Z Huang, Y Jiang, Z Liu - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
In this paper we uncover the untapped potential of diffusion U-Net which serves as a" free
lunch" that substantially improves the generation quality on the fly. We initially investigate …

Diffusion hyperfeatures: Searching through time and space for semantic correspondence

G Luo, L Dunlap, DH Park… - Advances in Neural …, 2024 - proceedings.neurips.cc
Diffusion models have been shown to be capable of generating high-quality images,
suggesting that they could contain meaningful internal representations. Unfortunately, the …

Visual instruction inversion: Image editing via image prompting

T Nguyen, Y Li, U Ojha, YJ Lee - Advances in Neural …, 2024 - proceedings.neurips.cc
Text-conditioned image editing has emerged as a powerful tool for editing images. However,
in many situations, language can be ambiguous and ineffective in describing specific image …

Videobooth: Diffusion-based video generation with image prompts

Y Jiang, T Wu, S Yang, C Si, D Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …

CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

Z Tang, Z Yang, M Khademi, Y Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present CoDi-2 a Multimodal Large Language Model (MLLM) for learning in-
context interleaved multimodal representations. By aligning modalities with language for …

Domain-agnostic tuning-encoder for fast personalization of text-to-image models

M Arar, R Gal, Y Atzmon, G Chechik… - SIGGRAPH Asia 2023 …, 2023 - dl.acm.org
Text-to-image (T2I) personalization allows users to guide the creative image generation
process by combining their own visual concepts in natural language prompts. Recently …

Concept decomposition for visual exploration and inspiration

Y Vinker, A Voynov, D Cohen-Or, A Shamir - ACM Transactions on …, 2023 - dl.acm.org
A creative idea is often born from transforming, combining, and modifying ideas from existing
visual examples capturing various concepts. However, one cannot simply copy the concept …

It's All About Your Sketch: Democratising Sketch Control in Diffusion Models

S Koley, AK Bhunia, D Sekhri, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper unravels the potential of sketches for diffusion models addressing the deceptive
promise of direct sketch control in generative AI. We importantly democratise the process …