Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

Cross-modal adapter for text-video retrieval

H Jiang, J Zhang, R Huang, C Ge, Z Ni, J Lu… - arXiv preprint arXiv …, 2022 - arxiv.org
Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …

Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing

Z Yu, R Cai, Y Cui, X Liu, Y Hu, AC Kot - International Journal of Computer …, 2024 - Springer
Recently, vision transformer (ViT) based multimodal learning methods have been proposed
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …

Kvq: Kwai video quality assessment for short-form videos

Y Lu, X Li, Y Pei, K Yuan, Q Xie, Y Qu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Short-form UGC video platforms like Kwai and TikTok have been an emerging and
irreplaceable mainstream media form thriving on user-friendly engagement and …

End-to-end temporal action detection with 1b parameters across 1000 frames

S Liu, CL Zhang, C Zhao… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Recently temporal action detection (TAD) has seen significant performance improvement
with end-to-end training. However due to the memory bottleneck only models with limited …

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

N Zhou, J Chen, D Huang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The visual models pretrained on large-scale benchmarks encode general knowledge and
prove effective in building more powerful representations for downstream tasks. Most …

Dec-adapter: Exploring efficient decoder-side adapter for bridging screen content and natural image compression

S Shen, H Yue, J Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Natural image compression has been greatly improved in the deep learning era. However,
the compression performance will be heavily degraded if the pretrained encoder is directly …

Pecop: Parameter efficient continual pretraining for action quality assessment

A Dadashzadeh, S Duan, A Whone… - Proceedings of the …, 2024 - openaccess.thecvf.com
The limited availability of labelled data in Action Quality Assessment (AQA), has forced
previous works to fine-tune their models pretrained on large-scale domain-general datasets …