Is sora a world simulator? a comprehensive survey on general world models and beyond
General world models represent a crucial pathway toward achieving Artificial General
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual …
Bigbench: A unified benchmark for social bias in text-to-image generative models based on multi-modal llm
Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability
to generate high-quality images, which also raises concerns about the social biases in their …
to generate high-quality images, which also raises concerns about the social biases in their …
[PDF][PDF] A Comprehensive Survey of Recent Transformers in Image, Video and Diffusion Models.
DPC Le, D Wang, VT Le - Computers, Materials & Continua, 2024 - cdn.techscience.cn
Transformer models have emerged as dominant networks for various tasks in computer
vision compared to Convolutional Neural Networks (CNNs). The transformers demonstrate …
vision compared to Convolutional Neural Networks (CNNs). The transformers demonstrate …
Efficient diffusion transformer with step-wise dynamic attention mediators
This paper identifies significant redundancy in the query-key interactions within self-attention
mechanisms of diffusion transformer models, particularly during the early stages of …
mechanisms of diffusion transformer models, particularly during the early stages of …
Investigating Deep Watermark Security: An Adversarial Transferability Perspective
B Qi, J Gao, Y Luo, J Liu, L Wu, B Zhou - arXiv preprint arXiv:2402.16397, 2024 - arxiv.org
The rise of generative neural networks has triggered an increased demand for intellectual
property (IP) protection in generated content. Deep watermarking techniques, recognized for …
property (IP) protection in generated content. Deep watermarking techniques, recognized for …
Dynamic diffusion transformer
Diffusion Transformer (DiT), an emerging diffusion model for image generation, has
demonstrated superior performance but suffers from substantial computational costs. Our …
demonstrated superior performance but suffers from substantial computational costs. Our …
Faster Image2Video Generation: A Closer Look at CLIP Image Embedding's Impact on Spatio-Temporal Cross-Attentions
This paper investigates the role of CLIP image embeddings within the Stable Video Diffusion
(SVD) framework, focusing on their impact on video generation quality and computational …
(SVD) framework, focusing on their impact on video generation quality and computational …
EdgeFusion: On-Device Text-to-Image Generation
The intensive computational burden of Stable Diffusion (SD) for text-to-image generation
poses a significant hurdle for its practical application. To tackle this challenge, recent …
poses a significant hurdle for its practical application. To tackle this challenge, recent …
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
Understanding long text is of great demands in practice but beyond the reach of most
language-image pre-training (LIP) models. In this work, we empirically confirm that the key …
language-image pre-training (LIP) models. In this work, we empirically confirm that the key …
CityCraft: A Real Crafter for 3D City Generation
City scene generation has gained significant attention in autonomous driving, smart city
development, and traffic simulation. It helps enhance infrastructure planning and monitoring …
development, and traffic simulation. It helps enhance infrastructure planning and monitoring …