Are we ready for a new paradigm shift? a survey on visual deep mlp
Recently, the proposed deep multilayer perceptron (MLP) models have stirred up a lot of
interest in the vision community. Historically, the availability of larger datasets combined with …
interest in the vision community. Historically, the availability of larger datasets combined with …
Piecewise linear neural networks and deep learning
As a powerful modelling method, piecewise linear neural networks (PWLNNs) have proven
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …
successful in various fields, most recently in deep learning. To apply PWLNN methods, both …
Understanding the robustness in vision transformers
Recent studies show that Vision Transformers (ViTs) exhibit strong robustness against
various corruptions. Although this property is partly attributed to the self-attention …
various corruptions. Although this property is partly attributed to the self-attention …
Focal modulation networks
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …
completely replaced by a focal modulation module for modeling token interactions in vision …
Adaptive frequency filters as efficient global token mixers
Recent vision transformers, large-kernel CNNs and MLPs have attained remarkable
successes in broad vision tasks thanks to their effective information fusion in the global …
successes in broad vision tasks thanks to their effective information fusion in the global …
Sequencer: Deep lstm for image classification
Y Tatsunami, M Taki - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In recent computer vision research, the advent of the Vision Transformer (ViT) has rapidly
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …
revolutionized various architectural design efforts: ViT achieved state-of-the-art image …
EMSN: An energy-efficient memristive sequencer network for human emotion classification in mental health monitoring
Mental health problems are an increasingly common social issue severely affecting health
and well-being. Multimedia processing technologies via facial expression show appealing …
and well-being. Multimedia processing technologies via facial expression show appealing …
Dynamixer: a vision mlp architecture with dynamic mixing
Recently, MLP-like vision models have achieved promising performances on mainstream
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …
visual recognition tasks. In contrast with vision transformers and CNNs, the success of MLP …
Mambamixer: Efficient selective state space models with dual token and channel selection
Recent advances in deep learning have mainly relied on Transformers due to their data
dependency and ability to learn at scale. The attention module in these architectures …
dependency and ability to learn at scale. The attention module in these architectures …
Mh-detr: Video moment and highlight detection with cross-modal transformer
Y Xu, Y Sun, B Zhai, Y Jia, S Du - 2024 International Joint …, 2024 - ieeexplore.ieee.org
With the increasing demand for video understanding, video moment and highlight detection
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …
(MHD) has emerged as a critical research topic. MHD aims to localize all moments and …