Towards zero-shot learning: A brief review and an attention-based embedding network
Zero-shot learning (ZSL), an emerging topic in recent years, targets at distinguishing unseen
class images by taking images from seen classes for training the classifier. Existing works …
class images by taking images from seen classes for training the classifier. Existing works …
Deep multimodal transfer learning for cross-modal retrieval
Cross-modal retrieval (CMR) enables flexible retrieval experience across different
modalities (eg, texts versus images), which maximally benefits us from the abundance of …
modalities (eg, texts versus images), which maximally benefits us from the abundance of …
Joint feature synthesis and embedding: Adversarial cross-modal retrieval revisited
X Xu, K Lin, Y Yang, A Hanjalic… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Recently, generative adversarial network (GAN) has shown its strong ability on modeling
data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the …
data distribution via adversarial learning. Cross-modal GAN, which attempts to utilize the …
TextControlGAN: Text-to-image synthesis with controllable generative adversarial networks
H Ku, M Lee - Applied Sciences, 2023 - mdpi.com
Generative adversarial networks (GANs) have demonstrated remarkable potential in the
realm of text-to-image synthesis. Nevertheless, conventional GANs employing conditional …
realm of text-to-image synthesis. Nevertheless, conventional GANs employing conditional …
Adversarial-metric learning for audio-visual cross-modal matching
A Zheng, M Hu, B Jiang, Y Huang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Audio-visual matching aims to learn the intrinsic correspondence between image and audio
clip. Existing works mainly concentrate on learning discriminative features, while ignore the …
clip. Existing works mainly concentrate on learning discriminative features, while ignore the …
Discriminative and robust attribute alignment for zero-shot learning
Zero-shot learning (ZSL) aims to learn models that can recognize images of semantically
related unseen categories, through transferring attribute-based knowledge learned from …
related unseen categories, through transferring attribute-based knowledge learned from …
Region reinforcement network with topic constraint for image-text matching
Image and sentence matching has attracted increasing attention since it is associated with
two important modalities of vision and language. Previous methods aim to find the latent …
two important modalities of vision and language. Previous methods aim to find the latent …
Bridge-GAN: Interpretable representation learning for text-to-image synthesis
Text-to-image synthesis is to generate images with the consistent content as the given text
description, which is a highly challenging task with two main issues: visual reality and …
description, which is a highly challenging task with two main issues: visual reality and …
Image-text retrieval with cross-modal semantic importance consistency
Cross-modal image-text retrieval is an important area of Vision-and-Language task that
models the similarity of image-text pairs by embedding features into a shared space for …
models the similarity of image-text pairs by embedding features into a shared space for …
Dual-aligned feature confusion alleviation for generalized zero-shot learning
Generalized zero-shot learning (GZSL) aims to recognize both seen and unseen samples by
leveraging the connections between semantic and visual representations. Recently, a …
leveraging the connections between semantic and visual representations. Recently, a …