查看文章

thecvf.com 中的 [PDF]

Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations

作者

Hao Wu, Jiayuan Mao, Yufeng Zhang, Yuning Jiang, Lei Li, Weiwei Sun, Wei-Ying Ma

发表日期

2019

研讨会论文

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

页码范围

6609-6618

简介

We propose the Unified Visual-Semantic Embeddings (Unified VSE) for learning a joint space of visual representation and textual semantics. The model unifies the embeddings of concepts at different levels: objects, attributes, relations, and full scenes. We view the sentential semantics as a combination of different semantic components such as objects and relations; their embeddings are aligned with different image regions. A contrastive learning approach is proposed for the effective learning of this fine-grained alignment from only image-caption pairs. We also present a simple yet effective approach that enforces the coverage of caption embeddings on the semantic components that appear in the sentence. We demonstrate that the Unified VSE outperforms baselines on cross-modal retrieval tasks; the enforcement of the semantic coverage improves the model's robustness in defending text-domain adversarial attacks. Moreover, our model empowers the use of visual cues to accurately resolve word dependencies in novel sentences.

引用总数

被引用次数：176

2019202020212022202320246 20 34 39 51 26

学术搜索中的文章

Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations

H Wu, J Mao, Y Zhang, Y Jiang, L Li, W Sun, WY Ma - Proceedings of the IEEE/CVF Conference on Computer …, 2019

被引用次数：176 相关文章所有 4 个版本