Mixer: Image to multi-modal retrieval learning for industrial application

文章

学术资源搜索

获得 4 条结果（用时0.01秒）

我的图书馆

Mixer: Image to multi-modal retrieval learning for industrial application

在引用文章中搜索

[PDF] thecvf.com

Audio-Visual Segmentation via Unlabeled Frame Exploitation

J Liu, Y Liu, F Zhang, C Ju… - Proceedings of the …, 2024 - openaccess.thecvf.com

Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …

Transformer-Empowered Multi-Modal Item Embedding for Enhanced Image Search in E-commerce

C Liu, P Hou, A Zeng, H Yu - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Over the past decade, significant advances have been made in the field of image search for
e-commerce applications. Traditional image-to-image retrieval models, which focus solely …

DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition

H Cheng, C Ju, H Wang, J Liu, M Chen, Q Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …

被引用次数：1 相关文章所有 2 个版本