Conv-adapter: Exploring parameter efficient transfer learning for convnets

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org

Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

被引用次数：29 相关文章所有 4 个版本

[PDF] thecvf.com

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system

H Zhang, JJ Xu, HW Cui, L Li, Y Yang… - … and Remote Sensing …, 2024 - ieeexplore.ieee.org

Artificial intelligence (AI) has significantly advanced Earth sciences, yet its full potential in to
comprehensively modeling Earth's complex dynamics remains unrealized. Geoscience …

被引用次数：7 相关文章所有 2 个版本

[PDF] arxiv.org

Parameter-efficient transfer learning for remote sensing image-text retrieval

Y Yuan, Y Zhan, Z Xiong - IEEE Transactions on Geoscience …, 2023 - ieeexplore.ieee.org

Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …

被引用次数：37 相关文章所有 4 个版本

[PDF] springer.com

Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing

Z Yu, R Cai, Y Cui, X Liu, Y Hu, AC Kot - International Journal of Computer …, 2024 - Springer

Recently, vision transformer (ViT) based multimodal learning methods have been proposed
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …

被引用次数：24 相关文章所有 3 个版本

[PDF] thecvf.com

Kvq: Kwai video quality assessment for short-form videos

Y Lu, X Li, Y Pei, K Yuan, Q Xie, Y Qu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Short-form UGC video platforms like Kwai and TikTok have been an emerging and
irreplaceable mainstream media form thriving on user-friendly engagement and …

被引用次数：6 相关文章

[PDF] arxiv.org

Cross-modal adapter for text-video retrieval

H Jiang, J Zhang, R Huang, C Ge, Z Ni, J Lu… - arXiv preprint arXiv …, 2022 - arxiv.org

Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …

被引用次数：41 相关文章所有 2 个版本

[PDF] acm.org

Parameter-efficient is not sufficient: Exploring parameter, memory, and time efficient adapter tuning for dense predictions

D Yin, X Han, B Li, H Feng, J Bai - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Pre-training & fine-tuning is a prevalent paradigm in computer vision (CV). Recently,
parameter-efficient transfer learning (PETL) methods have shown promising performance in …

被引用次数：20 相关文章所有 2 个版本

[PDF] thecvf.com

End-to-end temporal action detection with 1b parameters across 1000 frames

S Liu, CL Zhang, C Zhao… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Recently temporal action detection (TAD) has seen significant performance improvement
with end-to-end training. However due to the memory bottleneck only models with limited …

被引用次数：18 相关文章所有 4 个版本

[PDF] thecvf.com

DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration

N Zhou, J Chen, D Huang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

The visual models pretrained on large-scale benchmarks encode general knowledge and
prove effective in building more powerful representations for downstream tasks. Most …

被引用次数：4 相关文章所有 5 个版本