VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Salient object detection (SOD) and camouflaged object detection (COD) are related yet
distinct binary mapping tasks. These tasks involve multiple modalities sharing …
distinct binary mapping tasks. These tasks involve multiple modalities sharing …
Domain prompt learning with quaternion networks
Prompt learning has emerged as an effective and data-efficient technique in large Vision-
Language Models (VLMs). However when adapting VLMs to specialized domains such as …
Language Models (VLMs). However when adapting VLMs to specialized domains such as …
Medical supervised masked autoencoder: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification
J Mao, S Guo, X Yin, Y Chang, B Nie, Y Wang - Applied Soft Computing, 2024 - Elsevier
Recently, masked autoencoders (MAEs) have displayed great potential in many visual tasks.
However, in medical image classification tasks, most human tissue structures are highly …
However, in medical image classification tasks, most human tissue structures are highly …
Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …
Neural network developments: A detailed survey from static to dynamic models
Abstract Dynamic Neural Networks (DNNs) are an evolving research field within deep
learning (DL), offering a robust, adaptable, and efficient alternative to the conventional Static …
learning (DL), offering a robust, adaptable, and efficient alternative to the conventional Static …
Empowering Object Detection: Unleashing the Potential of Decoupled and Interactive Distillation
F Qian, J Hong, H Yan, H Chen… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Deploying state-of-the-art object detectors on resource-limited devices presents significant
challenges. Knowledge distillation is an efficient and streamlined lightweight technique to …
challenges. Knowledge distillation is an efficient and streamlined lightweight technique to …
Enat: Rethinking spatial-temporal interactions in token-based image synthesis
Recently, token-based generation have demonstrated their effectiveness in image synthesis.
As a representative example, non-autoregressive Transformers (NATs) can generate decent …
As a representative example, non-autoregressive Transformers (NATs) can generate decent …
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs
Vision-language models (VLMs) have shown remarkable success across various multi-
modal tasks, yet large VLMs encounter significant efficiency challenges due to processing …
modal tasks, yet large VLMs encounter significant efficiency challenges due to processing …
[HTML][HTML] A Novel Transformer Network Based on Cross–Spatial Learning and Deformable Attention for Composite Fault Diagnosis of Agricultural Machinery Bearings
X Li, M Li, B Liu, S Lv, C Liu - Agriculture, 2024 - mdpi.com
Diagnosing agricultural machinery faults is critical to agricultural automation, and identifying
vibration signals from faulty bearings is important for agricultural machinery fault diagnosis …
vibration signals from faulty bearings is important for agricultural machinery fault diagnosis …
Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer
Z Liu, J Zhu, N Li, G Huang - arXiv preprint arXiv:2409.13999, 2024 - arxiv.org
Parameter-efficient transfer learning (PETL) has shown great potential in adapting a vision
transformer (ViT) pre-trained on large-scale datasets to various downstream tasks. Existing …
transformer (ViT) pre-trained on large-scale datasets to various downstream tasks. Existing …