Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system

H Zhang, JJ Xu, HW Cui, L Li, Y Yang… - … and Remote Sensing …, 2024 - ieeexplore.ieee.org
Artificial intelligence (AI) has significantly advanced Earth sciences, yet its full potential in to
comprehensively modeling Earth's complex dynamics remains unrealized. Geoscience …

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing

Z Yu, R Cai, Y Cui, X Liu, Y Hu, AC Kot - International Journal of Computer …, 2024 - Springer
Recently, vision transformer (ViT) based multimodal learning methods have been proposed
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …

Kvq: Kwai video quality assessment for short-form videos

Y Lu, X Li, Y Pei, K Yuan, Q Xie, Y Qu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Short-form UGC video platforms like Kwai and TikTok have been an emerging and
irreplaceable mainstream media form thriving on user-friendly engagement and …

Cross-modal adapter for text-video retrieval

H Jiang, J Zhang, R Huang, C Ge, Z Ni, J Lu… - arXiv preprint arXiv …, 2022 - arxiv.org
Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …

Parameter-efficient is not sufficient: Exploring parameter, memory, and time efficient adapter tuning for dense predictions

D Yin, X Han, B Li, H Feng, J Bai - Proceedings of the 32nd ACM …, 2024 - dl.acm.org
Pre-training & fine-tuning is a prevalent paradigm in computer vision (CV). Recently,
parameter-efficient transfer learning (PETL) methods have shown promising performance in …

End-to-end temporal action detection with 1b parameters across 1000 frames

S Liu, CL Zhang, C Zhao… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Recently temporal action detection (TAD) has seen significant performance improvement
with end-to-end training. However due to the memory bottleneck only models with limited …

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

N Zhou, J Chen, D Huang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
The visual models pretrained on large-scale benchmarks encode general knowledge and
prove effective in building more powerful representations for downstream tasks. Most …