Conceptbed: Evaluating concept learning abilities of text-to-image diffusion models

C Kim, K Min, M Patel, S Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

The rapid advancement of generative models facilitating the creation of hyper-realistic
images from textual descriptions has concurrently escalated critical societal concerns such …

被引用次数：21 相关文章所有 3 个版本

[PDF] thecvf.com

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

被引用次数：10 相关文章所有 3 个版本

[PDF] thecvf.com

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Z Wang, L Wei, T Wang, H Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Text-to-image (T2I) generative models have recently emerged as a powerful tool
enabling the creation of photo-realistic images and giving rise to a multitude of applications …

被引用次数：2 相关文章所有 3 个版本

[PDF] springer.com

A survey on knowledge-enhanced multimodal learning

M Lymperaiou, G Stamou - Artificial Intelligence Review, 2024 - Springer

Multimodal learning has been a field of increasing interest, aiming to combine various
modalities in a single joint representation. Especially in the area of visiolinguistic (VL) …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

M Patel, S Jung, C Baral, Y Yang - arXiv preprint arXiv:2402.05195, 2024 - arxiv.org

Despite the recent advances in personalized text-to-image (P-T2I) generative models,
subject-driven T2I remains challenging. The primary bottlenecks include 1) Intensive training …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

Conceptmix: A compositional image generation benchmark with controllable difficulty

X Wu, D Yu, Y Huang, O Russakovsky… - arXiv preprint arXiv …, 2024 - arxiv.org

Compositionality is a critical capability in Text-to-Image (T2I) models, as it reflects their ability
to understand and combine multiple concepts from text descriptions. Existing evaluations of …

被引用次数：1 相关文章所有 2 个版本

[PDF] wiley.com

CUPID: Contextual Understanding of Prompt‐conditioned Image Distributions

Y Zhao, M Li, M Berger - Computer Graphics Forum, 2024 - Wiley Online Library

We present CUPID: a visualization method for the contextual understanding of prompt‐
conditioned image distributions. CUPID targets the visual analysis of distributions produced …

被引用次数：1 相关文章所有 5 个版本

[PDF] wiley.com