Visual prompting in multimodal large language models: A survey

J Wu, Z Zhang, Y Xia, X Li, Z Xia, A Chang, T Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) equip pre-trained large-language models
(LLMs) with visual capabilities. While textual prompting in LLMs has been widely studied …

Magicquill: An intelligent interactive image editing system

Z Liu, Y Yu, H Ouyang, Q Wang, KL Cheng… - arXiv preprint arXiv …, 2024 - arxiv.org
Image editing involves a variety of complex tasks and requires efficient and precise
manipulation techniques. In this paper, we present MagicQuill, an integrated image editing …

Patterns of Creativity: How User Input Shapes AI-Generated Visual Diversity

MTDR Palmini, E Cetinic - arXiv preprint arXiv:2410.06768, 2024 - arxiv.org
Recent critiques of Artificial-intelligence (AI)-generated visual content highlight concerns
about the erosion of artistic originality, as these systems often replicate patterns from their …

MemoVis: A GenAI-Powered Tool for Creating Companion Reference Images for 3D Design Feedback

C Chen, C Nguyen, T Groueix, VG Kim… - ACM Transactions on …, 2024 - dl.acm.org
Providing asynchronous feedback is a critical step in the 3D design workflow. A common
approach to providing feedback is to pair textual comments with companion reference …

Integrating quantum CI and generative AI for Taiwanese/English co-learning

CS Lee, MH Wang, CY Chen, SC Yang… - Quantum Machine …, 2024 - Springer
This paper proposes a quantum co mputational intelligence (QCI) model integrated with
generative artificial intelligence (GAI) for Taiwanese/English language co-learning …

StyleFactory: Towards Better Style Alignment in Image Creation through Style-Strength-Based Control and Evaluation

M Zhou, D Zhang, W You, Z Yu, Y Wu, C Pan… - Proceedings of the 37th …, 2024 - dl.acm.org
Generative AI models have been widely used for image creation. However, generating
images that are well-aligned with users' personal styles on aesthetic features (eg, color and …

Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

C Chen, F Lv, Y Guan, P Wang, S Yu, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The performance of computer vision models in certain real-world applications (eg, rare
wildlife observation) is limited by the small number of available images. Expanding datasets …

I Can Embrace and Avoid Vagueness Myself: Supporting the Design Process by Balancing Vagueness through Text-to-Image Generative AI

M Kim, B Kim, K Han - arXiv preprint arXiv:2411.08588, 2024 - arxiv.org
This study examines the role of vagueness in the design process and its strategic
management for the effective human-AI interaction. While vagueness in the generation of …

Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy

S Zhai, H Chen, Y Dong, J Li, Q Shen, Y Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
Text-to-image diffusion models have achieved tremendous success in the field of
controllable image generation, while also coming along with issues of privacy leakage and …

Mapping the Mind of an Instruction-based Image Editing using SMILE

Z Dehghani, K Aslansefat, A Khan, AR Rivera… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite recent advancements in Instruct-based Image Editing models for generating high-
quality images, they are known as black boxes and a significant barrier to transparency and …