Pixart-{\delta}: Fast and controllable image generation with latent consistency models

Z Zhu, X Wang, W Zhao, C Min, N Deng, M Dou… - arXiv preprint arXiv …, 2024 - arxiv.org

General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Bigbench: A unified benchmark for social bias in text-to-image generative models based on multi-modal llm

H Luo, H Huang, Z Deng, X Liu, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability
to generate high-quality images, which also raises concerns about the social biases in their …

被引用次数：6 相关文章所有 3 个版本

[PDF] techscience.cn

[PDF][PDF] A Comprehensive Survey of Recent Transformers in Image, Video and Diffusion Models.

DPC Le, D Wang, VT Le - Computers, Materials & Continua, 2024 - cdn.techscience.cn

Transformer models have emerged as dominant networks for various tasks in computer
vision compared to Convolutional Neural Networks (CNNs). The transformers demonstrate …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Efficient diffusion transformer with step-wise dynamic attention mediators

Y Pu, Z Xia, J Guo, D Han, Q Li, D Li, Y Yuan… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Investigating Deep Watermark Security: An Adversarial Transferability Perspective

B Qi, J Gao, Y Luo, J Liu, L Wu, B Zhou - arXiv preprint arXiv:2402.16397, 2024 - arxiv.org

The rise of generative neural networks has triggered an increased demand for intellectual
property (IP) protection in generated content. Deep watermarking techniques, recognized for …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Dynamic diffusion transformer

W Zhao, Y Han, J Tang, K Wang, Y Song… - arXiv preprint arXiv …, 2024 - arxiv.org

Diffusion Transformer (DiT), an emerging diffusion model for image generation, has
demonstrated superior performance but suffers from substantial computational costs. Our …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions

A Taghipour, M Ghahremani, M Bennamoun… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion
(SVD) framework, focusing on their impact on video generation quality and computational …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

EdgeFusion: On-Device Text-to-Image Generation

T Castells, HK Song, T Piao, S Choi, BK Kim… - arXiv preprint arXiv …, 2024 - arxiv.org

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation
poses a significant hurdle for its practical application. To tackle this challenge, recent …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

LoTLIP: Improving Language-Image Pre-training for Long Text Understanding

W Wu, K Zheng, S Ma, F Lu, Y Guo, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Understanding long text is of great demands in practice but beyond the reach of most
language-image pre-training (LIP) models. In this work, we empirically confirm that the key …

CityCraft: A Real Crafter for 3D City Generation

J Deng, W Chai, J Huang, Z Zhao, Q Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

City scene generation has gained significant attention in autonomous driving, smart city
development, and traffic simulation. It helps enhance infrastructure planning and monitoring …

被引用次数：4 相关文章所有 2 个版本