Pic2word: Mapping pictures to words for zero-shot composed image retrieval
Abstract In Composed Image Retrieval (CIR), a user combines a query image with text to
describe their intended target. Existing methods rely on supervised learning of CIR models …
describe their intended target. Existing methods rely on supervised learning of CIR models …
Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks
In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …
Controllable person image synthesis with pose-constrained latent diffusion
Controllable person image synthesis aims at rendering a source image based on user-
specified changes in body pose or appearance. Prior art approaches leverage pixel-level …
specified changes in body pose or appearance. Prior art approaches leverage pixel-level …
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Two primary input modalities prevail in image retrieval: sketch and text. While text is widely
used for inter-category retrieval tasks sketches have been established as the sole preferred …
used for inter-category retrieval tasks sketches have been established as the sole preferred …
Target-guided composed image retrieval
Composed image retrieval (CIR) is a new and flexible image retrieval paradigm, which can
retrieve the target image for a multimodal query, including a reference image and its …
retrieve the target image for a multimodal query, including a reference image and its …
Dual alignment unsupervised domain adaptation for video-text retrieval
Video-text retrieval is an emerging stream in both computer vision and natural language
processing communities, which aims to find relevant videos given text queries. In this paper …
processing communities, which aims to find relevant videos given text queries. In this paper …
Composed image retrieval with text feedback via multi-grained uncertainty regularization
We investigate composed image retrieval with text feedback. Users gradually look for the
target of interest by moving from coarse to fine-grained feedback. However, existing …
target of interest by moving from coarse to fine-grained feedback. However, existing …
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image Retrieval
Different from the Composed Image Retrieval task that requires expensive labels for training
task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks …
task-specific models, Zero-Shot Composed Image Retrieval (ZS-CIR) involves diverse tasks …
A Survey on Fashion Image Retrieval
Fashion is the manner in which we introduce ourselves to the world and has become
perhaps the biggest industry on the planet. In recent years, fashion-related research has …
perhaps the biggest industry on the planet. In recent years, fashion-related research has …
Composed image retrieval using contrastive learning and task-oriented clip-based features
Given a query composed of a reference image and a relative caption, the Composed Image
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …
Retrieval goal is to retrieve images visually similar to the reference one that integrates the …