Common diffusion noise schedules and sample steps are flawed
Proceedings of the IEEE/CVF winter conference on applications …, 2024•openaccess.thecvf.com
We discover that common diffusion noise schedules do not enforce the last timestep to have
zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not
start from the last timestep. Such designs are flawed and do not reflect the fact that the model
is given pure Gaussian noise at inference, creating a discrepancy between training and
inference. We show that the flawed design causes real problems in existing
implementations. In Stable Diffusion, it severely limits the model to only generate images …
zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not
start from the last timestep. Such designs are flawed and do not reflect the fact that the model
is given pure Gaussian noise at inference, creating a discrepancy between training and
inference. We show that the flawed design causes real problems in existing
implementations. In Stable Diffusion, it severely limits the model to only generate images …
Abstract
We discover that common diffusion noise schedules do not enforce the last timestep to have zero signal-to-noise ratio (SNR), and some implementations of diffusion samplers do not start from the last timestep. Such designs are flawed and do not reflect the fact that the model is given pure Gaussian noise at inference, creating a discrepancy between training and inference. We show that the flawed design causes real problems in existing implementations. In Stable Diffusion, it severely limits the model to only generate images with medium brightness and prevents it from generating very bright and dark samples. We propose a few simple fixes:(1) rescale the noise schedule to enforce zero terminal SNR;(2) train the model with v prediction;(3) change the sampler to always start from the last timestep;(4) rescale classifier-free guidance to prevent over-exposure. These simple changes ensure the diffusion process is congruent between training and inference and allow the model to generate samples more faithful to the original data distribution.
openaccess.thecvf.com
以上显示的是最相近的搜索结果。 查看全部搜索结果