Using human feedback to fine-tune diffusion models without any reward model

Lodge: A coarse to fine diffusion network for long dance generation guided by the characteristic dance primitives

R Li, YX Zhang, Y Zhang, H Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

We propose Lodge a network capable of generating extremely long dance sequences
conditioned on given music. We design Lodge as a two-stage coarse to fine diffusion …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Aligning diffusion models by optimizing human utility

S Li, K Kallidromitis, A Gokul, Y Kato… - arXiv preprint arXiv …, 2024 - arxiv.org

We present Diffusion-KTO, a novel approach for aligning text-to-image diffusion models by
formulating the alignment objective as the maximization of expected human utility. Since this …

被引用次数：4 相关文章所有 2 个版本

[PDF] acm.org

Participation in the age of foundation models

H Suresh, E Tseng, M Young, M Gray… - The 2024 ACM …, 2024 - dl.acm.org

Growing interest and investment in the capabilities of foundation models has positioned
such systems to impact a wide array of services, from banking to healthcare. Alongside …

ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization

L Eyring, S Karthik, K Roth, A Dosovitskiy… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-Image (T2I) models have made significant advancements in recent years, but they
still struggle to accurately capture intricate details specified in complex compositional …

相关文章所有 2 个版本

[PDF] arxiv.org

Bridging Model-Based Optimization and Generative Modeling via Conservative Fine-Tuning of Diffusion Models

M Uehara, Y Zhao, E Hajiramezanali, G Scalia… - arXiv preprint arXiv …, 2024 - arxiv.org

AI-driven design problems, such as DNA/protein sequence design, are commonly tackled
from two angles: generative modeling, which efficiently captures the feasible design space …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Z Liang, Y Yuan, S Gu, B Chen, T Hang, J Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, Direct Preference Optimization (DPO) has extended its success from aligning
large language models (LLMs) to aligning text-to-image diffusion models with human …

相关文章所有 2 个版本

[PDF] arxiv.org

PopAlign: Population-Level Alignment for Fair Text-to-Image Generation

S Li, H Singh, A Grover - arXiv preprint arXiv:2406.19668, 2024 - arxiv.org

Text-to-image (T2I) models achieve high-fidelity generation through extensive training on
large datasets. However, these models may unintentionally pick up undesirable biases of …

相关文章所有 2 个版本

[PDF] arxiv.org

Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback

C Chen, Y Hu, W Wu, H Wang, ES Chng… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, text-to-speech (TTS) technology has witnessed impressive advancements,
particularly with large-scale training datasets, showcasing human-level speech quality and …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

YaART: Yet Another ART Rendering Technology

S Kastryulin, A Konev, A Shishenya… - arXiv preprint arXiv …, 2024 - arxiv.org

In the rapidly progressing field of generative models, the development of efficient and high-
fidelity text-to-image diffusion systems represents a significant frontier. This study introduces …

相关文章所有 2 个版本