An overview of diffusion models: Applications, guided generation, statistical rates and optimization

M Chen, S Mei, J Fan, M Wang - arXiv preprint arXiv:2404.07771, 2024 - arxiv.org
Diffusion models, a powerful and universal generative AI technology, have achieved
tremendous success in computer vision, audio, reinforcement learning, and computational …

The llama 3 herd of models

A Dubey, A Jauhri, A Pandey, A Kadian… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern artificial intelligence (AI) systems are powered by foundation models. This paper
presents a new set of foundation models, called Llama 3. It is a herd of language models …

Cross-attention makes inference cumbersome in text-to-image diffusion models

W Zhang, H Liu, J Xie, F Faccio, MZ Shou… - arXiv preprint arXiv …, 2024 - arxiv.org
This study explores the role of cross-attention during inference in text-conditional diffusion
models. We find that cross-attention outputs converge to a fixed point after few inference …

Diffusion model for data-driven black-box optimization

Z Li, H Yuan, K Huang, C Ni, Y Ye, M Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative AI has redefined artificial intelligence, enabling the creation of innovative content
and customized solutions that drive business practices into a new era of efficiency and …

Align your steps: Optimizing sampling schedules in diffusion models

A Sabour, S Fidler, K Kreis - arXiv preprint arXiv:2404.14507, 2024 - arxiv.org
Diffusion models (DMs) have established themselves as the state-of-the-art generative
modeling approach in the visual domain and beyond. A crucial drawback of DMs is their …

Learning Diffusion at Lightspeed

A Terpin, N Lanzetti, F Dörfler - arXiv preprint arXiv:2406.12616, 2024 - arxiv.org
Diffusion regulates a phenomenal number of natural processes and the dynamics of many
successful generative models. Existing models to learn the diffusion terms from …

Long-form music generation with latent diffusion

Z Evans, JD Parker, CJ Carr, Z Zukowski… - arXiv preprint arXiv …, 2024 - arxiv.org
Audio-based generative models for music have seen great strides recently, but so far have
not managed to produce full-length music tracks with coherent musical structure. We show …

Generative Image as Action Models

M Shridhar, YL Lo, S James - arXiv preprint arXiv:2407.07875, 2024 - arxiv.org
Image-generation diffusion models have been fine-tuned to unlock new capabilities such as
image-editing and novel view synthesis. Can we similarly unlock image-generation models …

Magic Clothing: Controllable Garment-Driven Image Synthesis

W Chen, T Gu, Y Xu, C Chen - arXiv preprint arXiv:2404.09512, 2024 - arxiv.org
We propose Magic Clothing, a latent diffusion model (LDM)-based network architecture for
an unexplored garment-driven image synthesis task. Aiming at generating customized …

Dimba: Transformer-Mamba Diffusion Models

Z Fei, M Fan, C Yu, D Li, Y Zhang, J Huang - arXiv preprint arXiv …, 2024 - arxiv.org
This paper unveils Dimba, a new text-to-image diffusion model that employs a distinctive
hybrid architecture combining Transformer and Mamba elements. Specifically, Dimba …