Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models

C Mou, X Wang, L Xie, Y Wu, J Zhang, Z Qi… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
The incredible generative ability of large-scale text-to-image (T2I) models has demonstrated
strong power of learning complex structures and meaningful semantics. However, relying …

Layoutllm-t2i: Eliciting layout guidance from llm for text-to-image generation

L Qu, S Wu, H Fei, L Nie, TS Chua - Proceedings of the 31st ACM …, 2023 - dl.acm.org
In the text-to-image generation field, recent remarkable progress in Stable Diffusion makes it
possible to generate rich kinds of novel photorealistic images. However, current models still …

[HTML][HTML] Generative ai for visualization: State of the art and future directions

Y Ye, J Hao, Y Hou, Z Wang, S Xiao, Y Luo, W Zeng - Visual Informatics, 2024 - Elsevier
Generative AI (GenAI) has witnessed remarkable progress in recent years and
demonstrated impressive performance in various generation tasks in different domains such …

Panacea: Panoramic and controllable video generation for autonomous driving

Y Wen, Y Zhao, Y Liu, F Jia, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The field of autonomous driving increasingly demands high-quality annotated training data.
In this paper we propose Panacea an innovative approach to generate panoramic and …

A unified framework for guiding generative ai with wireless perception in resource constrained mobile edge networks

J Wang, H Du, D Niyato, J Kang, Z Xiong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
With the significant advancements in artificial intelligence (AI) technologies and
computational capabilities, generative AI (GAI) has become a pivotal digital content …

Bevcontrol: Accurately controlling street-view elements with multi-perspective consistency via bev sketch layout

K Yang, E Ma, J Peng, Q Guo, D Lin, K Yu - arXiv preprint arXiv …, 2023 - arxiv.org
Using synthesized images to boost the performance of perception models is a long-standing
research challenge in computer vision. It becomes more eminent in visual-centric …

FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models

A Luo, X Li, F Yang, J Liu, H Fan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Optical flow estimation a process of predicting pixel-wise displacement between consecutive
frames has commonly been approached as a regression task in the age of deep learning …

Shadow-Enlightened Image Outpainting

H Yu, R Li, S Xie, J Qiu - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Conventional image outpainting methods usually treat unobserved areas as unknown and
extend the scene only in terms of semantic consistency thus overlooking the hidden …

Data augmentation for object detection via controllable diffusion models

H Fang, B Han, S Zhang, S Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
Data augmentation is vital for object detection tasks that require expensive bounding box
annotations. Recent successes in diffusion models have inspired the use of diffusion-based …