TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation
Video generation has many unique challenges beyond those of image generation. The
temporal dimension introduces extensive possible variations across frames, over which …
temporal dimension introduces extensive possible variations across frames, over which …
Scalable Ranked Preference Optimization for Text-to-Image Generation
Direct Preference Optimization (DPO) has emerged as a powerful approach to align text-to-
image (T2I) models with human feedback. Unfortunately, successful application of DPO to …
image (T2I) models with human feedback. Unfortunately, successful application of DPO to …
Towards robust visual understanding: A paradigm shift in computer vision from recognition to reasoning
T Gokhale - AI Magazine, 2024 - Wiley Online Library
Abstract Models that learn from data are widely and rapidly being deployed today for real‐
world use, but they suffer from unforeseen failures that limit their reliability. These failures …
world use, but they suffer from unforeseen failures that limit their reliability. These failures …
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Text-to-image (T2I) generative models, such as Stable Diffusion and DALL-E, have shown
remarkable proficiency in producing high-quality, realistic, and natural images from textual …
remarkable proficiency in producing high-quality, realistic, and natural images from textual …