Conv-adapter: Exploring parameter efficient transfer learning for convnets

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org

Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

被引用次数：22 相关文章所有 4 个版本

[PDF] thecvf.com

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org

Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

被引用次数：25 相关文章所有 4 个版本

[PDF] arxiv.org

Cross-modal adapter for text-video retrieval

H Jiang, J Zhang, R Huang, C Ge, Z Ni, J Lu… - arXiv preprint arXiv …, 2022 - arxiv.org

Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …

被引用次数：33 相关文章所有 2 个版本

[PDF] springer.com

Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing

Z Yu, R Cai, Y Cui, X Liu, Y Hu, AC Kot - International Journal of Computer …, 2024 - Springer

Recently, vision transformer (ViT) based multimodal learning methods have been proposed
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …

被引用次数：20 相关文章所有 3 个版本

[PDF] thecvf.com

Kvq: Kwai video quality assessment for short-form videos

Y Lu, X Li, Y Pei, K Yuan, Q Xie, Y Qu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Short-form UGC video platforms like Kwai and TikTok have been an emerging and
irreplaceable mainstream media form thriving on user-friendly engagement and …

被引用次数：4 相关文章

[PDF] thecvf.com

End-to-end temporal action detection with 1b parameters across 1000 frames

S Liu, CL Zhang, C Zhao… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recently temporal action detection (TAD) has seen significant performance improvement
with end-to-end training. However due to the memory bottleneck only models with limited …

被引用次数：10 相关文章所有 4 个版本

[PDF] thecvf.com

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

N Zhou, J Chen, D Huang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

The visual models pretrained on large-scale benchmarks encode general knowledge and
prove effective in building more powerful representations for downstream tasks. Most …

被引用次数：3 相关文章所有 5 个版本

[PDF] thecvf.com

Dec-adapter: Exploring efficient decoder-side adapter for bridging screen content and natural image compression

S Shen, H Yue, J Yang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Natural image compression has been greatly improved in the deep learning era. However,
the compression performance will be heavily degraded if the pretrained encoder is directly …

被引用次数：8 相关文章所有 3 个版本

[PDF] thecvf.com

Pecop: Parameter efficient continual pretraining for action quality assessment

A Dadashzadeh, S Duan, A Whone… - Proceedings of the …, 2024 - openaccess.thecvf.com

The limited availability of labelled data in Action Quality Assessment (AQA), has forced
previous works to fine-tune their models pretrained on large-scale domain-general datasets …

被引用次数：6 相关文章所有 6 个版本