Anydoor: Zero-shot object-level image customization
This work presents AnyDoor a diffusion-based image generator with the power to teleport
target objects to new scenes at user-specified locations with desired shapes. Instead of …
target objects to new scenes at user-specified locations with desired shapes. Instead of …
Generative multimodal models are in-context learners
Humans can easily solve multimodal tasks in context with only a few demonstrations or
simple instructions which current multimodal systems largely struggle to imitate. In this work …
simple instructions which current multimodal systems largely struggle to imitate. In this work …
Instantbooth: Personalized text-to-image generation without test-time finetuning
Recent advances in personalized image generation have enabled pre-trained text-to-image
models to learn new concepts from specific image sets. However these methods often …
models to learn new concepts from specific image sets. However these methods often …
Dreamllm: Synergistic multimodal comprehension and creation
This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …
Style aligned image generation via shared attention
Abstract Large-scale Text-to-Image (T2I) models have rapidly gained prominence across
creative fields generating visually compelling outputs from textual prompts. However …
creative fields generating visually compelling outputs from textual prompts. However …
Alpha-clip: A clip model focusing on wherever you want
Abstract Contrastive Language-Image Pre-training (CLIP) plays an essential role in
extracting valuable content information from images across diverse tasks. It aligns textual …
extracting valuable content information from images across diverse tasks. It aligns textual …
Lavis: A library for language-vision intelligence
We introduce LAVIS, an open-source deep learning library for LAnguage-VISion research
and applications. LAVIS aims to serve as a one-stop comprehensive library that brings …
and applications. LAVIS aims to serve as a one-stop comprehensive library that brings …
Gpt4point: A unified framework for point-language understanding and generation
Abstract Multimodal Large Language Models (MLLMs) have excelled in 2D image-text
comprehension and image generation but their understanding of the 3D world is notably …
comprehension and image generation but their understanding of the 3D world is notably …
Videobooth: Diffusion-based video generation with image prompts
Text-driven video generation witnesses rapid progress. However merely using text prompts
is not enough to depict the desired subject appearance that accurately aligns with users' …
is not enough to depict the desired subject appearance that accurately aligns with users' …
Subject-diffusion: Open domain personalized text-to-image generation without test-time fine-tuning
Recent progress in personalized image generation using diffusion models has been
significant. However, development in the area of open-domain and non-fine-tuning …
significant. However, development in the area of open-domain and non-fine-tuning …