Iteratively Prompting Multimodal LLMs to Reproduce Natural and AI-Generated Images

A Naseh, K Thai, M Iyyer, A Houmansadr - arXiv preprint arXiv:2404.13784, 2024 - arxiv.org
With the digital imagery landscape rapidly evolving, image stocks and AI-generated image
marketplaces have become central to visual media. Traditional stock images now exist …

Diffutoon: High-Resolution Editable Toon Shading via Diffusion Models

Z Duan, C Wang, C Chen, W Qian, J Huang - arXiv preprint arXiv …, 2024 - arxiv.org
Toon shading is a type of non-photorealistic rendering task of animation. Its primary purpose
is to render objects with a flat and stylized appearance. As diffusion models have ascended …

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

Y Liu, M He, F Yao, Y Ji, S Tao, J Du, D Li… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of text-to-image synthesis (TIS) models has significantly influenced digital
image creation by producing high-quality visuals from written descriptions. Yet these models …

Attentive Linguistic Tracking in Diffusion Models for Training-free Text-guided Image Editing

B Liu, C Wang, J Huang, K Jia - ACM Multimedia 2024, 2024 - openreview.net
Building on recent breakthroughs in diffusion-based text-to-image synthesis (TIS), training-
free text-guided image editing (TIE) has become an indispensable aspect of modern image …

[PDF][PDF] “It's like a rubber duck that talks back”: Understanding Generative AI-Assisted Data Analysis Workflows through a Participatory Prompting Study

I Drosos, A Sarkar, X Xu, C Negreanu, S Rintel… - Proceedings of the 3rd …, 2024 - advait.org
Generative AI tools can help users with many tasks. One such task is data analysis, which is
notoriously challenging for non-expert end-users due to its expertise requirements, and …

DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

J Wang, C Wang, T Cao, J Huang, L Jin - arXiv preprint arXiv:2403.04997, 2024 - arxiv.org
We present DiffChat, a novel method to align Large Language Models (LLMs) to" chat" with
prompt-as-input Text-to-Image Synthesis (TIS) models (eg, Stable Diffusion) for interactive …

Product2IMG: Prompt-Free E-commerce Product Background Generation with Diffusion Model and Self-Improved LMM

T Cao, J Kong, X Zhao, W Yao, J Ding, J Zhu… - ACM Multimedia …, 2024 - openreview.net
In e-commerce platforms, visual content plays a pivotal role in capturing and retaining
audience attention. A high-quality and aesthetically designed product background image …

" Imagine a Dress": Exploring the case of task-specific prompt assistants for text-to-image AI tools

HW Chen, L Istead - Proceedings of the 50th Graphics Interface …, 2024 - dl.acm.org
In this paper, we explore the impact of task-specific prompt assistants for text-to-image
generative AI tools through a user study. Participants were asked to recreate a dress with …

SSP: A Simple and Safe automatic Prompt engineering method towards realistic image synthesis on LVM

W Cheng, J Liu, J Deng, F Ren - arXiv preprint arXiv:2401.01128, 2024 - arxiv.org
Recently, text-to-image (T2I) synthesis has undergone significant advancements, particularly
with the emergence of Large Language Models (LLM) and their enhancement in Large …

E-Bench: Towards Evaluating the Ease-of-Use of Large Language Models

Z Zhang, B Hao, J Li, Z Zhang, D Zhao - arXiv preprint arXiv:2406.10950, 2024 - arxiv.org
Most large language models (LLMs) are sensitive to prompts, and another synonymous
expression or a typo may lead to unexpected results for the model. Composing an optimal …