Tunnel Try-on: Excavating Spatial-temporal Tunnels for High-quality Virtual Try-on in Videos
Video try-on is a challenging task and has not been well tackled in previous works. The main
obstacle lies in preserving the details of the clothing and modeling the coherent motions …
obstacle lies in preserving the details of the clothing and modeling the coherent motions …
Zero-shot Image Editing with Reference Imitation
Image editing serves as a practical yet challenging task considering the diverse demands
from users, where one of the hardest parts is to precisely describe how the edited image …
from users, where one of the hardest parts is to precisely describe how the edited image …
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
C Ju, H Wang, H Cheng, X Chen, Z Zhai… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the
impressive performance. However, their expensive computation costs, ie, throughput and …
impressive performance. However, their expensive computation costs, ie, throughput and …
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
As one of the fundamental video tasks in computer vision, Open-Vocabulary Action
Recognition (OVAR) recently gains increasing attention, with the development of vision …
Recognition (OVAR) recently gains increasing attention, with the development of vision …