Revisiting multimodal representation in contrastive learning: from patch and token embeddings to finite discrete tokens

Y Chen, J Yuan, Y Tian, S Geng, X Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Contrastive learning-based vision-language pre-training approaches, such as CLIP, have
demonstrated great success in many vision-language tasks. These methods achieve cross …

[PDF][PDF] Image Preference Estimation with Word Embedding Model and Convolutional Neural Network

万樺 - 2021 - dspace02.jaist.ac.jp
To improve the traditional image recommendation system and classification methods, we
propose measuring a distance between two vectorized representations: User Preference …