Vl-pet: Vision-and-language parameter-efficient tuning via granularity control
As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …
becomes prohibitively expensive for model training and storage. In vision-and-language …
Parameter-efficient transfer learning for remote sensing image-text retrieval
Vision-and-language pretraining (VLP) models have experienced a surge in popularity
recently. By fine-tuning them on specific datasets, significant performance improvements …
recently. By fine-tuning them on specific datasets, significant performance improvements …
Cross-modal adapter for text-video retrieval
Text-video retrieval is an important multi-modal learning task, where the goal is to retrieve
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …
the most relevant video for a given text query. Recently, pre-trained models, eg, CLIP, show …
Rethinking vision transformer and masked autoencoder in multimodal face anti-spoofing
Recently, vision transformer (ViT) based multimodal learning methods have been proposed
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …
to improve the robustness of face anti-spoofing (FAS) systems. However, there are still no …
Kvq: Kwai video quality assessment for short-form videos
Short-form UGC video platforms like Kwai and TikTok have been an emerging and
irreplaceable mainstream media form thriving on user-friendly engagement and …
irreplaceable mainstream media form thriving on user-friendly engagement and …
End-to-end temporal action detection with 1b parameters across 1000 frames
Recently temporal action detection (TAD) has seen significant performance improvement
with end-to-end training. However due to the memory bottleneck only models with limited …
with end-to-end training. However due to the memory bottleneck only models with limited …
DR-Tune: Improving Fine-tuning of Pretrained Visual Models by Distribution Regularization with Semantic Calibration
The visual models pretrained on large-scale benchmarks encode general knowledge and
prove effective in building more powerful representations for downstream tasks. Most …
prove effective in building more powerful representations for downstream tasks. Most …
Dec-adapter: Exploring efficient decoder-side adapter for bridging screen content and natural image compression
Natural image compression has been greatly improved in the deep learning era. However,
the compression performance will be heavily degraded if the pretrained encoder is directly …
the compression performance will be heavily degraded if the pretrained encoder is directly …
Pecop: Parameter efficient continual pretraining for action quality assessment
A Dadashzadeh, S Duan, A Whone… - Proceedings of the …, 2024 - openaccess.thecvf.com
The limited availability of labelled data in Action Quality Assessment (AQA), has forced
previous works to fine-tune their models pretrained on large-scale domain-general datasets …
previous works to fine-tune their models pretrained on large-scale domain-general datasets …