Fashionvil: Fashion-focused vision-and-language representation learning

K Saito, K Sohn, X Zhang, CL Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …

被引用次数：74 相关文章所有 9 个版本

[PDF] thecvf.com

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

被引用次数：25 相关文章所有 8 个版本

[PDF] thecvf.com

Controllable person image synthesis with pose-constrained latent diffusion

X Han, X Zhu, J Deng, YZ Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

Controllable person image synthesis aims at rendering a source image based on user-
specified changes in body pose or appearance. Prior art approaches leverage pixel-level …

被引用次数：13 相关文章所有 3 个版本

[PDF] thecvf.com

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

S Koley, AK Bhunia, A Sain… - Proceedings of the …, 2024 - openaccess.thecvf.com

Two primary input modalities prevail in image retrieval: sketch and text. While text is widely
used for inter-category retrieval tasks sketches have been established as the sole preferred …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Target-guided composed image retrieval

H Wen, X Zhang, X Song, Y Wei, L Nie - Proceedings of the 31st ACM …, 2023 - dl.acm.org

Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can
retrieve the target image for a multimodal query, including a reference image and its …

被引用次数：20 相关文章所有 3 个版本

[PDF] thecvf.com

Dual alignment unsupervised domain adaptation for video-text retrieval

X Hao, W Zhang, D Wu, F Zhu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Composed image retrieval with text feedback via multi-grained uncertainty regularization

Y Chen, Z Zheng, W Ji, L Qu, TS Chua - arXiv preprint arXiv:2211.07394, 2022 - arxiv.org

We investigate composed image retrieval with text feedback. Users gradually look for the
target of interest by moving from coarse to fine-grained feedback. However, existing …

被引用次数：28 相关文章所有 3 个版本

[PDF] aaai.org

Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval

Y Tang, J Yu, K Gai, J Zhuang, G Xiong, Y Hu… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Different from the Composed Image Retrieval task that requires expensive labels for training
task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks …

被引用次数：17 相关文章所有 4 个版本

[HTML] mdpi.com

A Survey on Fashion Image Retrieval

SM Islam, S Joardar, AA Sekh* - ACM Computing Surveys, 2024 - dl.acm.org

Fashion is the manner in which we introduce ourselves to the world and has become
perhaps the biggest industry on the planet. In recent years, fashion-related research has …

[PDF] arxiv.org

Composed image retrieval using contrastive learning and task-oriented clip-based features

A Baldrati, M Bertini, T Uricchio… - ACM Transactions on …, 2023 - dl.acm.org

Given a query composed of a reference image and a relative caption, the Composed Image
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …

被引用次数：14 相关文章所有 6 个版本