Sparse MLP for image recognition: Is self-attention really necessary?

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com

Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

被引用次数：73 相关文章所有 7 个版本

[PDF] arxiv.org

Piecewise linear neural networks and deep learning

Q Tao, L Li, X Huang, X Xi, S Wang… - Nature Reviews Methods …, 2022 - nature.com

As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …

被引用次数：26 相关文章所有 6 个版本

[PDF] mlr.press

Understanding the robustness in vision transformers

D Zhou, Z Yu, E Xie, C Xiao… - International …, 2022 - proceedings.mlr.press

Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

被引用次数：179 相关文章所有 6 个版本

[PDF] neurips.cc

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

被引用次数：204 相关文章所有 6 个版本

[PDF] thecvf.com

Adaptive frequency filters as efficient global token mixers

Z Huang, Z Zhang, C Lan, ZJ Zha… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable
successes in broad vision tasks thanks to their effective information fusion in the global …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

Sequencer: Deep lstm for image classification

Y Tatsunami, M Taki - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …

被引用次数：68 相关文章所有 5 个版本

[PDF] brunel.ac.uk

EMSN: An energy-efficient memristive sequencer network for human emotion classification in mental health monitoring

X Ji, Z Dong, Y Han, CS Lai, G Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Mental health problems are an increasingly common social issue severely affecting health
and well-being. Multimedia processing technologies via facial expression show appealing …

被引用次数：49 相关文章所有 5 个版本

[PDF] mlr.press

Dynamixer: a vision mlp architecture with dynamic mixing

Z Wang, W Jiang, YM Zhu, L Yuan… - … on machine learning, 2022 - proceedings.mlr.press

Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Mambamixer: Efficient selective state space models with dual token and channel selection

A Behrouz, M Santacatterina, R Zabih - arXiv preprint arXiv:2403.19888, 2024 - arxiv.org

Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Mh-detr: Video moment and highlight detection with cross-modal transformer

Y Xu, Y Sun, B Zhai, Y Jia, S Du - 2024 International Joint …, 2024 - ieeexplore.ieee.org

With the increasing demand for video understanding, video moment and highlight detection
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …

被引用次数：21 相关文章所有 2 个版本