Svdiff: Compact parameter space for diffusion fine-tuning
Recently, diffusion models have achieved remarkable success in text-to-image generation,
enabling the creation of high-quality images from text prompts and various conditions …
enabling the creation of high-quality images from text prompts and various conditions …
Vbench: Comprehensive benchmark suite for video generative models
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …
remains a challenge. A comprehensive evaluation benchmark for video generation is …
Freeu: Free lunch in diffusion u-net
In this paper we uncover the untapped potential of diffusion U-Net which serves as a" free
lunch" that substantially improves the generation quality on the fly. We initially investigate …
lunch" that substantially improves the generation quality on the fly. We initially investigate …
Diffusion hyperfeatures: Searching through time and space for semantic correspondence
Diffusion models have been shown to be capable of generating high-quality images,
suggesting that they could contain meaningful internal representations. Unfortunately, the …
suggesting that they could contain meaningful internal representations. Unfortunately, the …
Visual instruction inversion: Image editing via image prompting
Text-conditioned image editing has emerged as a powerful tool for editing images. However,
in many situations, language can be ambiguous and ineffective in describing specific image …
in many situations, language can be ambiguous and ineffective in describing specific image …
Videobooth: Diffusion-based video generation with image prompts
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …
is not enough to depict the desired subject appearance that accurately aligns with users' …
CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation
Abstract We present CoDi-2 a Multimodal Large Language Model (MLLM) for learning in-
context interleaved multimodal representations. By aligning modalities with language for …
context interleaved multimodal representations. By aligning modalities with language for …
Domain-agnostic tuning-encoder for fast personalization of text-to-image models
Text-to-image (T2I) personalization allows users to guide the creative image generation
process by combining their own visual concepts in natural language prompts. Recently …
process by combining their own visual concepts in natural language prompts. Recently …
Concept decomposition for visual exploration and inspiration
A creative idea is often born from transforming, combining, and modifying ideas from existing
visual examples capturing various concepts. However, one cannot simply copy the concept …
visual examples capturing various concepts. However, one cannot simply copy the concept …
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
This paper unravels the potential of sketches for diffusion models addressing the deceptive
promise of direct sketch control in generative AI. We importantly democratise the process …
promise of direct sketch control in generative AI. We importantly democratise the process …