[HTML][HTML] Multimodal learning with online text cleaning for e-commerce product search

Z Hu, S Li, M Du, A Dhua, D Gray - 2024 - amazon.science
Vision-language transformer models play a pivotal role in e-commerce product search.
When using product description (eg product title) and product image pairs to train such …

De-noised Vision-language Fusion Guided by Visual Cues for E-commerce Product Search

Z Hu, S Li, M Du, A Dhua… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
In e-commerce applications vision-language multimodal transformer models play a pivotal
role in product search. The key to successfully training a multimodal model lies in the …