Layoutgpt: Compositional visual planning and generation with large language models

W Feng, W Zhu, T Fu, V Jampani… - Advances in …, 2024 - proceedings.neurips.cc
Attaining a high degree of user controllability in visual generation often requires intricate,
fine-grained inputs like layouts. However, such inputs impose a substantial burden on users …

Magicbrush: A manually annotated dataset for instruction-guided image editing

K Zhang, L Mo, W Chen, H Sun… - Advances in Neural …, 2024 - proceedings.neurips.cc
Text-guided image editing is widely needed in daily life, ranging from personal use to
professional applications such as Photoshop. However, existing methods are either zero …

Training-free structured diffusion guidance for compositional text-to-image synthesis

W Feng, X He, TJ Fu, V Jampani, A Akula… - arXiv preprint arXiv …, 2022 - arxiv.org
Large-scale diffusion models have achieved state-of-the-art results on text-to-image
synthesis (T2I) tasks. Despite their ability to generate high-quality yet creative images, we …

Counterfactual vqa: A cause-effect look at language bias

Y Niu, K Tang, H Zhang, Z Lu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language. In this paper …

Talk-to-edit: Fine-grained facial editing via dialog

Y Jiang, Z Huang, X Pan, CC Loy… - Proceedings of the …, 2021 - openaccess.thecvf.com
Facial editing is an important task in vision and graphics with numerous applications.
However, existing works are incapable to deliver a continuous and fine-grained editing …

Guiding instruction-based image editing via multimodal large language models

TJ Fu, W Hu, X Du, WY Wang, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Instruction-based image editing improves the controllability and flexibility of image
manipulation via natural commands without elaborate descriptions or regional masks …

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

Language-driven artistic style transfer

TJ Fu, XE Wang, WY Wang - European Conference on Computer Vision, 2022 - Springer
Despite having promising results, style transfer, which requires preparing style images in
advance, may result in lack of creativity and accessibility. Following human instruction, on …

Talk-to-edit: Fine-grained 2d and 3d facial editing via dialog

Y Jiang, Z Huang, T Wu, X Pan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Facial editing is to manipulate the facial attributes of a given face image. Nowadays, with the
development of generative models, users can easily generate 2D and 3D facial images with …

Iterative multi-granular image editing using diffusion models

KJ Joseph, P Udhayanan, T Shukla… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in text-guided image synthesis has dramatically changed how creative
professionals generate artistic and aesthetically pleasing visual assets. To fully support such …