Sdxl: Improving latent diffusion models for high-resolution image synthesis
D Podell, Z English, K Lacey, A Blattmann… - arXiv preprint arXiv …, 2023 - arxiv.org
We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …
Mvdream: Multi-view diffusion for 3d generation
We propose MVDream, a multi-view diffusion model that is able to generate geometrically
consistent multi-view images from a given text prompt. By leveraging image diffusion models …
consistent multi-view images from a given text prompt. By leveraging image diffusion models …
Stable video diffusion: Scaling latent video diffusion models to large datasets
We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …
Emu edit: Precise image editing via recognition and generation tasks
Instruction-based image editing holds immense potential for a variety of applications as it
enables users to perform any editing operation using a natural language instruction …
enables users to perform any editing operation using a natural language instruction …
Seesr: Towards semantics-aware real-world image super-resolution
Owe to the powerful generative priors the pre-trained text-to-image (T2I) diffusion models
have become increasingly popular in solving the real-world image super-resolution …
have become increasingly popular in solving the real-world image super-resolution …
Gavatar: Animatable 3d gaussian avatars with implicit mesh learning
Gaussian splatting has emerged as a powerful 3D representation that harnesses the
advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper we …
advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper we …
Photorealistic video generation with diffusion models
We present WALT, a transformer-based approach for photorealistic video generation via
diffusion modeling. Our approach has two key design decisions. First, we use a causal …
diffusion modeling. Our approach has two key design decisions. First, we use a causal …
Emu video: Factorizing text-to-video generation by explicit image conditioning
We present Emu Video, a text-to-video generation model that factorizes the generation into
two steps: first generating an image conditioned on the text, and then generating a video …
two steps: first generating an image conditioned on the text, and then generating a video …
Flowvid: Taming imperfect optical flows for consistent video-to-video synthesis
Diffusion models have transformed the image-to-image (I2I) synthesis and are now
permeating into videos. However the advancement of video-to-video (V2V) synthesis has …
permeating into videos. However the advancement of video-to-video (V2V) synthesis has …
Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization
Diffusion models have demonstrated impressive performance in various image generation,
editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable …
editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable …