VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

Z Luo, N Liu, W Zhao, X Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Salient object detection (SOD) and camouflaged object detection (COD) are related yet
distinct binary mapping tasks. These tasks involve multiple modalities sharing …

Domain prompt learning with quaternion networks

Q Cao, Z Xu, Y Chen, C Ma… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Prompt learning has emerged as an effective and data-efficient technique in large Vision-
Language Models (VLMs). However when adapting VLMs to specialized domains such as …

Medical supervised masked autoencoder: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification

J Mao, S Guo, X Yin, Y Chang, B Nie, Y Wang - Applied Soft Computing, 2024 - Elsevier
Recently, masked autoencoders (MAEs) have displayed great potential in many visual tasks.
However, in medical image classification tasks, most human tissue structures are highly …

Sparse-Tuning: Adapting vision transformers with efficient fine-tuning and inference

T Liu, X Liu, S Huang, L Shi, Z Xu, Y Xin, Q Yin… - arXiv preprint arXiv …, 2024 - arxiv.org
Parameter-efficient fine-tuning (PEFT) has emerged as a popular solution for adapting pre-
trained Vision Transformer (ViT) models to downstream applications. While current PEFT …

Neural network developments: A detailed survey from static to dynamic models

PR Verma, NP Singh, D Pantola, X Cheng - Computers and Electrical …, 2024 - Elsevier
Abstract Dynamic Neural Networks (DNNs) are an evolving research field within deep
learning (DL), offering a robust, adaptable, and efficient alternative to the conventional Static …

Empowering Object Detection: Unleashing the Potential of Decoupled and Interactive Distillation

F Qian, J Hong, H Yan, H Chen… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Deploying state-of-the-art object detectors on resource-limited devices presents significant
challenges. Knowledge distillation is an efficient and streamlined lightweight technique to …

Enat: Rethinking spatial-temporal interactions in token-based image synthesis

Z Ni, Y Wang, R Zhou, Y Han, J Guo, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, token-based generation have demonstrated their effectiveness in image synthesis.
As a representative example, non-autoregressive Transformers (NATs) can generate decent …

A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs

W Zhao, Y Han, J Tang, Z Li, Y Song, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language models (VLMs) have shown remarkable success across various multi-
modal tasks, yet large VLMs encounter significant efficiency challenges due to processing …

[HTML][HTML] A Novel Transformer Network Based on Cross–Spatial Learning and Deformable Attention for Composite Fault Diagnosis of Agricultural Machinery Bearings

X Li, M Li, B Liu, S Lv, C Liu - Agriculture, 2024 - mdpi.com
Diagnosing agricultural machinery faults is critical to agricultural automation, and identifying
vibration signals from faulty bearings is important for agricultural machinery fault diagnosis …

Multiple-Exit Tuning: Towards Inference-Efficient Adaptation for Vision Transformer

Z Liu, J Zhu, N Li, G Huang - arXiv preprint arXiv:2409.13999, 2024 - arxiv.org
Parameter-efficient transfer learning (PETL) has shown great potential in adapting a vision
transformer (ViT) pre-trained on large-scale datasets to various downstream tasks. Existing …