Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives

R Li, YX Zhang, Y Zhang, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arXiv preprint arXiv …, 2024 - arxiv.org
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

Aligning diffusion models by optimizing human utility

S Li, K Kallidromitis, A Gokul, Y Kato… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by
formulating the alignment objective as the maximization of expected human utility. Since this …

Participation in the age of foundation models

H Suresh, E Tseng, M Young, M Gray… - The 2024 ACM …, 2024 - dl.acm.org
Growing interest and investment in the capabilities of foundation models has positioned
such systems to impact a wide array of services, from banking to healthcare. Alongside …

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

L Eyring, S Karthik, K Roth, A Dosovitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

M Uehara, Y Zhao, E Hajiramezanali, G Scalia… - arXiv preprint arXiv …, 2024 - arxiv.org
AI-driven design problems, such as DNA/protein sequence design, are commonly tackled
from two angles: generative modeling, which efficiently captures the feasible design space …

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Z Liang, Y Yuan, S Gu, B Chen, T Hang, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, Direct Preference Optimization (DPO) has extended its success from aligning
large language models (LLMs) to aligning text-to-image diffusion models with human …

PopAlign: Population-Level Alignment for Fair Text-to-Image Generation

S Li, H Singh, A Grover - arXiv preprint arXiv:2406.19668, 2024 - arxiv.org
Text-to-image (T2I) models achieve high-fidelity generation through extensive training on
large datasets. However, these models may unintentionally pick up undesirable biases of …

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

C Chen, Y Hu, W Wu, H Wang, ES Chng… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, text-to-speech (TTS) technology has witnessed impressive advancements,
particularly with large-scale training datasets, showcasing human-level speech quality and …

YaART: Yet Another ART Rendering Technology

S Kastryulin, A Konev, A Shishenya… - arXiv preprint arXiv …, 2024 - arxiv.org
In the rapidly progressing field of generative models, the development of efficient and high-
fidelity text-to-image diffusion systems represents a significant frontier. This study introduces …