Audio-Visual Segmentation via Unlabeled Frame Exploitation
Audio-visual segmentation (AVS) aims to segment the sounding objects in video frames.
Although great progress has been witnessed we experimentally reveal that current methods …
Although great progress has been witnessed we experimentally reveal that current methods …
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
Transformer-Empowered Multi-Modal Item Embedding for Enhanced Image Search in E-commerce
Over the past decade, significant advances have been made in the field of image search for
e-commerce applications. Traditional image-to-image retrieval models, which focus solely …
e-commerce applications. Traditional image-to-image retrieval models, which focus solely …
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Recognition (OVAR) recently gains increasing attention, with the development of vision …