Are we ready for a new paradigm shift? a survey on visual deep mlp

R Liu, Y Li, L Tao, D Liang, HT Zheng - Patterns, 2022 - cell.com
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …

Piecewise linear neural networks and deep learning

Q Tao, L Li, X Huang, X Xi, S Wang… - Nature Reviews Methods …, 2022 - nature.com
As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …

Understanding the robustness in vision transformers

D Zhou, Z Yu, E Xie, C Xiao… - International …, 2022 - proceedings.mlr.press
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

Adaptive frequency filters as efficient global token mixers

Z Huang, Z Zhang, C Lan, ZJ Zha… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable
successes in broad vision tasks thanks to their effective information fusion in the global …

Sequencer: Deep lstm for image classification

Y Tatsunami, M Taki - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …

EMSN: An energy-efficient memristive sequencer network for human emotion classification in mental health monitoring

X Ji, Z Dong, Y Han, CS Lai, G Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Mental health problems are an increasingly common social issue severely affecting health
and well-being. Multimedia processing technologies via facial expression show appealing …

Dynamixer: a vision mlp architecture with dynamic mixing

Z Wang, W Jiang, YM Zhu, L Yuan… - … on machine learning, 2022 - proceedings.mlr.press
Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …

Mambamixer: Efficient selective state space models with dual token and channel selection

A Behrouz, M Santacatterina, R Zabih - arXiv preprint arXiv:2403.19888, 2024 - arxiv.org
Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …

Mh-detr: Video moment and highlight detection with cross-modal transformer

Y Xu, Y Sun, B Zhai, Y Jia, S Du - 2024 International Joint …, 2024 - ieeexplore.ieee.org
With the increasing demand for video understanding, video moment and highlight detection
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …