Common diffusion noise schedules and sample steps are flawed

D Podell, Z English, K Lacey, A Blattmann… - arXiv preprint arXiv …, 2023 - arxiv.org

We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to
previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone …

被引用次数：907 相关文章所有 4 个版本

[PDF] arxiv.org

Mvdream: Multi-view diffusion for 3d generation

Y Shi, P Wang, J Ye, M Long, K Li, X Yang - arXiv preprint arXiv …, 2023 - arxiv.org

We propose MVDream, a multi-view diffusion model that is able to generate geometrically
consistent multi-view images from a given text prompt. By leveraging image diffusion models …

被引用次数：318 相关文章所有 3 个版本

[PDF] arxiv.org

Stable video diffusion: Scaling latent video diffusion models to large datasets

A Blattmann, T Dockhorn, S Kulal… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Stable Video Diffusion-a latent video diffusion model for high-resolution, state-of-
the-art text-to-video and image-to-video generation. Recently, latent diffusion models trained …

被引用次数：336 相关文章所有 2 个版本

[PDF] thecvf.com

Emu edit: Precise image editing via recognition and generation tasks

S Sheynin, A Polyak, U Singer… - Proceedings of the …, 2024 - openaccess.thecvf.com

Instruction-based image editing holds immense potential for a variety of applications as it
enables users to perform any editing operation using a natural language instruction …

被引用次数：47 相关文章所有 5 个版本

[PDF] thecvf.com

Seesr: Towards semantics-aware real-world image super-resolution

R Wu, T Yang, L Sun, Z Zhang, S Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Owe to the powerful generative priors the pre-trained text-to-image (T2I) diffusion models
have become increasingly popular in solving the real-world image super-resolution …

被引用次数：35 相关文章所有 3 个版本

[PDF] thecvf.com

Gavatar: Animatable 3d gaussian avatars with implicit mesh learning

Y Yuan, X Li, Y Huang, S De Mello… - Proceedings of the …, 2024 - openaccess.thecvf.com

Gaussian splatting has emerged as a powerful 3D representation that harnesses the
advantages of both explicit (mesh) and implicit (NeRF) 3D representations. In this paper we …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Photorealistic video generation with diffusion models

A Gupta, L Yu, K Sohn, X Gu, M Hahn, L Fei-Fei… - arXiv preprint arXiv …, 2023 - arxiv.org

We present WALT, a transformer-based approach for photorealistic video generation via
diffusion modeling. Our approach has two key design decisions. First, we use a causal …

被引用次数：74 相关文章所有 3 个版本

[PDF] arxiv.org

Emu video: Factorizing text-to-video generation by explicit image conditioning

R Girdhar, M Singh, A Brown, Q Duval, S Azadi… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Emu Video, a text-to-video generation model that factorizes the generation into
two steps: first generating an image conditioned on the text, and then generating a video …

被引用次数：48 相关文章所有 2 个版本

[PDF] thecvf.com

Flowvid: Taming imperfect optical flows for consistent video-to-video synthesis

F Liang, B Wu, J Wang, L Yu, K Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

Diffusion models have transformed the image-to-image (I2I) synthesis and are now
permeating into videos. However the advancement of video-to-video (V2V) synthesis has …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Pixel-aware stable diffusion for realistic image super-resolution and personalized stylization

T Yang, R Wu, P Ren, X Xie, L Zhang - arXiv preprint arXiv:2308.14469, 2023 - arxiv.org

Diffusion models have demonstrated impressive performance in various image generation,
editing, enhancement and translation tasks. In particular, the pre-trained text-to-image stable …

被引用次数：41 相关文章所有 2 个版本