Parameter-efficient fine-tuning for large models: A comprehensive survey
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …
enabling remarkable achievements across various tasks. However, their unprecedented …
Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment
With the continuous growth in the number of parameters of transformer-based pretrained
language models (PLMs), particularly the emergence of large language models (LLMs) with …
language models (PLMs), particularly the emergence of large language models (LLMs) with …
Simda: Simple diffusion adapter for efficient video generation
The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …
Ma-sam: Modality-agnostic sam adaptation for 3d medical image segmentation
Abstract The Segment Anything Model (SAM), a foundation model for general image
segmentation, has demonstrated impressive zero-shot performance across numerous …
segmentation, has demonstrated impressive zero-shot performance across numerous …
What makes good examples for visual in-context learning?
Large vision models with billions of parameters and trained on broad data have great
potential in numerous downstream applications. However, these models are typically difficult …
potential in numerous downstream applications. However, these models are typically difficult …
Vision transformers are parameter-efficient audio-visual learners
Vision transformers (ViTs) have achieved impressive results on various computer vision
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …
tasks in the last several years. In this work, we study the capability of frozen ViTs, pretrained …
Bidirectional cross-modal knowledge exploration for video recognition with pre-trained vision-language models
Vision-language models (VLMs) pre-trained on large-scale image-text pairs have
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
demonstrated impressive transferability on various visual tasks. Transferring knowledge …
Vindlu: A recipe for effective video-and-language pretraining
The last several years have witnessed remarkable progress in video-and-language (VidL)
understanding. However, most modern VidL approaches use complex and specialized …
understanding. However, most modern VidL approaches use complex and specialized …
Dreamvideo: Composing your dream videos with customized subject and motion
Customized generation using diffusion models has made impressive progress in image
generation but remains unsatisfactory in the challenging video generation task as it requires …
generation but remains unsatisfactory in the challenging video generation task as it requires …
Uniformerv2: Unlocking the potential of image vits for video understanding
The prolific performances of Vision Transformers (ViTs) in image tasks have prompted
research into adapting the image ViTs for video tasks. However, the substantial gap …
research into adapting the image ViTs for video tasks. However, the substantial gap …