Pic2word: Mapping pictures to words for zero-shot composed image retrieval

K Saito, K Sohn, X Zhang, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

Controllable person image synthesis with pose-constrained latent diffusion

X Han, X Zhu, J Deng, YZ Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
Controllable person image synthesis aims at rendering a source image based on user-
specified changes in body pose or appearance. Prior art approaches leverage pixel-level …

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

S Koley, AK Bhunia, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com
Two primary input modalities prevail in image retrieval: sketch and text. While text is widely
used for inter-category retrieval tasks sketches have been established as the sole preferred …

Target-guided composed image retrieval

H Wen, X Zhang, X Song, Y Wei, L Nie - Proceedings of the 31st ACM …, 2023 - dl.acm.org
Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can
retrieve the target image for a multimodal query, including a reference image and its …

Dual alignment unsupervised domain adaptation for video-text retrieval

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …

Composed image retrieval with text feedback via multi-grained uncertainty regularization

Y Chen, Z Zheng, W Ji, L Qu, TS Chua - arXiv preprint arXiv:2211.07394, 2022 - arxiv.org
We investigate composed image retrieval with text feedback. Users gradually look for the
target of interest by moving from coarse to fine-grained feedback. However, existing …

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

Y Tang, J Yu, K Gai, J Zhuang, G Xiong, Y Hu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Different from the Composed Image Retrieval task that requires expensive labels for training
task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks …

A Survey on Fashion Image Retrieval

SM Islam, S Joardar, AA Sekh* - ACM Computing Surveys, 2024 - dl.acm.org
Fashion is the manner in which we introduce ourselves to the world and has become
perhaps the biggest industry on the planet. In recent years, fashion-related research has …

Composed image retrieval using contrastive learning and task-oriented clip-based features

A Baldrati, M Bertini, T Uricchio… - ACM Transactions on …, 2023 - dl.acm.org
Given a query composed of a reference image and a relative caption, the Composed Image
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …