SoT: Delving deeper into classification head for transformer

CH Song, J Yoon, S Choi… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

The explosive increase in vision transformers studies has shown remarkable progress in
vision tasks such as image classification and detection. However, in instance-level image …

被引用次数：23 相关文章所有 6 个版本

[PDF] mdpi.com

A comparative evaluation between convolutional neural networks and vision transformers for COVID-19 detection

SI Nafisah, G Muhammad, MS Hossain, SA AlQahtani - Mathematics, 2023 - mdpi.com

Early illness detection enables medical professionals to deliver the best care and increases
the likelihood of a full recovery. In this work, we show that computer-aided design (CAD) …

被引用次数：13 相关文章所有 6 个版本

[PDF] thecvf.com

DKT: Diverse knowledge transfer transformer for class incremental learning

X Gao, Y He, S Dong, J Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Deep neural networks suffer from catastrophic forgetting in class incremental learning,
where the classification accuracy of old classes drastically deteriorates when the networks …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

NOAH: Learning Pairwise Object Category Attentions for Image Classification

C Li, A Zhou, A Yao - arXiv preprint arXiv:2402.02377, 2024 - arxiv.org

A modern deep neural network (DNN) for image classification tasks typically consists of two
parts: a backbone for feature extraction, and a head for feature encoding and class …

被引用次数：2 相关文章所有 2 个版本

Towards a Deeper Understanding of Global Covariance Pooling in Deep Learning: An Optimization Perspective

Q Wang, Z Zhang, M Gao, J Xie, P Zhu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Global covariance pooling (GCP) as an effective alternative to global average pooling has
shown good capacity to improve deep convolutional neural networks (CNNs) in a variety of …

被引用次数：2 相关文章所有 6 个版本

基于语言− 视觉对比学习的多模态视频行为识别方法

张颖，张冰冰，董微，安峰民，张建新，张强 - 自动化学报, 2024 - aas.net.cn

以对比语言− 图像预训练(Contrastive language-image pre-training, CLIP) 模型为基础,
提出一种面向视频行为识别的多模态模型, 该模型从视觉编码器的时序建模和行为类别语言描述 …

被引用次数：1 相关文章所有 2 个版本

High-order correlation network for video recognition

W Dong, Z Wang, B Zhang, J Zhang… - 2022 International Joint …, 2022 - ieeexplore.ieee.org

How to model global video representation is an important research content of video
recognition. Among current convolutional neural network (CNN) based methods, only using …

被引用次数：3 相关文章

Remote Sensing Scene Classification via Second-order Differentiable Token Transformer Network

K Ni, Q Wu, S Li, Z Zheng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

The vision transformer has been widely applied in remote sensing image scene
classification due to its excellent ability to capture global features. However, remote sensing …

[PDF] arxiv.org

PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification

Z Wang, Q Sun, B Zhang, P Wang, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Few-shot learning has been successfully applied to medical image classification as only
very few medical examples are available for training. Due to the challenging problem of …

Optimization of neural networks for deep learning and applications to CT image segmentation

G Pezzano - 2023 - diposit.ub.edu

[eng] During the last few years, AI development in deep learning has been going so fast that
even important researchers, politicians, and entrepreneurs are signing petitions to try to slow …