Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, J Zhang, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

L Xu, H Xie, SZJ Qin, X Tao, FL Wang - arXiv preprint arXiv:2312.12148, 2023 - arxiv.org
With the continuous growth in the number of parameters of transformer-based pretrained
language models (PLMs), particularly the emergence of large language models (LLMs) with …

Simda: Simple diffusion adapter for efficient video generation

Z Xing, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation

C Chen, J Miao, D Wu, A Zhong, Z Yan, S Kim… - Medical Image …, 2024 - Elsevier
Abstract The Segment Anything Model (SAM), a foundation model for general image
segmentation, has demonstrated impressive zero-shot performance across numerous …

What makes good examples for visual in-context learning?

Y Zhang, K Zhou, Z Liu - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Large vision models with billions of parameters and trained on broad data have great
potential in numerous downstream applications. However, these models are typically difficult …

Vision transformers are parameter-efficient audio-visual learners

YB Lin, YL Sung, J Lei, M Bansal… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision transformers (ViTs) have achieved impressive results on various computer vision
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …

Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models

W Wu, X Wang, H Luo, J Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …

Vindlu: A recipe for effective video-and-language pretraining

F Cheng, X Wang, J Lei, D Crandall… - Proceedings of the …, 2023 - openaccess.thecvf.com
The last several years have witnessed remarkable progress in video-and-language (VidL)
understanding. However, most modern VidL approaches use complex and specialized …

Dreamvideo: Composing your dream videos with customized subject and motion

Y Wei, S Zhang, Z Qing, H Yuan, Z Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …

Uniformerv2: Unlocking the potential of image vits for video understanding

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The prolific performances of Vision Transformers (ViTs) in image tasks have prompted
research into adapting the image ViTs for video tasks. However, the substantial gap …