作者
Erkun Yang, Dongren Yao, Tongliang Liu, Cheng Deng
发表日期
2022
研讨会论文
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
页码范围
7551-7560
简介
Deep cross-modal hashing has become an essential tool for supervised multimodal search. These models tend to be optimized with large, curated multimodal datasets, where most labels have been manually verified. Unfortunately, in many scenarios, such accurate labeling may not be available. In contrast, datasets with low-quality annotations may be acquired, which inevitably introduce numerous mistakes or label noise and therefore degrade the search performance. To address the challenge, we present a general robust cross-modal hashing framework to correlate distinct modalities and combat noisy labels simultaneously. More specifically, we propose a proxy-based contrastive (PC) loss to mitigate the gap between different modalities and train networks for different modalities jointly with small-loss samples that are selected with the PC loss and a mutual quantization loss. The small-loss sample selection from such joint loss can help choose confident examples to guide the model training, and the mutual quantization loss can maximize the agreement between different modalities and is beneficial to improve the effectiveness of sample selection. Experiments on three widely-used multimodal datasets show that our method significantly outperforms existing state-of-the-art methods.
引用总数
学术搜索中的文章
E Yang, D Yao, T Liu, C Deng - Proceedings of the IEEE/CVF Conference on Computer …, 2022