Wouaf: Weight modulation for user attribution and fingerprinting in text-to-image diffusion models

C Kim, K Min, M Patel, S Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rapid advancement of generative models facilitating the creation of hyper-realistic
images from textual descriptions has concurrently escalated critical societal concerns such …

Eclipse: A resource-efficient text-to-image prior for image generations

M Patel, C Kim, S Cheng, C Baral… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Text-to-image (T2I) diffusion models notably the unCLIP models (eg DALL-E-2)
achieve state-of-the-art (SOTA) performance on various compositional T2I benchmarks at …

Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Z Wang, L Wei, T Wang, H Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Text-to-image (T2I) generative models have recently emerged as a powerful tool
enabling the creation of photo-realistic images and giving rise to a multitude of applications …

A survey on knowledge-enhanced multimodal learning

M Lymperaiou, G Stamou - Artificial Intelligence Review, 2024 - Springer
Multimodal learning has been a field of increasing interest, aiming to combine various
modalities in a single joint representation. Especially in the area of visiolinguistic (VL) …

-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

M Patel, S Jung, C Baral, Y Yang - arXiv preprint arXiv:2402.05195, 2024 - arxiv.org
Despite the recent advances in personalized text-to-image (P-T2I) generative models,
subject-driven T2I remains challenging. The primary bottlenecks include 1) Intensive training …

Conceptmix: A compositional image generation benchmark with controllable difficulty

X Wu, D Yu, Y Huang, O Russakovsky… - arXiv preprint arXiv …, 2024 - arxiv.org
Compositionality is a critical capability in Text-to-Image (T2I) models, as it reflects their ability
to understand and combine multiple concepts from text descriptions. Existing evaluations of …

CUPID: Contextual Understanding of Prompt‐conditioned Image Distributions

Y Zhao, M Li, M Berger - Computer Graphics Forum, 2024 - Wiley Online Library
We present CUPID: a visualization method for the contextual understanding of prompt‐
conditioned image distributions. CUPID targets the visual analysis of distributions produced …

Towards robust visual understanding: A paradigm shift in computer vision from recognition to reasoning

T Gokhale - AI Magazine, 2024 - Wiley Online Library
Abstract Models that learn from data are widely and rapidly being deployed today for real‐
world use, but they suffer from unforeseen failures that limit their reliability. These failures …

Multimodal Content Generation

M Luo, T Gokhale, N Varshney, Y Yang… - Advances in Multimodal …, 2024 - Springer
In this chapter, we will review the advances that are being made in this new field of
multimodal content generation and also discuss several challenges associated with this …

Strengthening Image Generative AI: Integrating Fingerprinting and Revision Methods for Enhanced Safety and Control

C Kim - 2024 - search.proquest.com
In the rapidly evolving field of Generative Artificial Intelligence (Gen-AI) for imaging, models
such as DALL· E3 and Stable Diffusion have transitioned from theoretical concepts to …