Efficient token mixing for transformers via adaptive fourier neural operators

A theoretical understanding of shallow vision transformers: Learning, generalization, and sample complexity

H Li, M Wang, S Liu, PY Chen - arXiv preprint arXiv:2302.06015, 2023 - arxiv.org

Vision Transformers (ViTs) with self-attention modules have recently achieved great
empirical success in many vision tasks. Due to non-convex interactions across layers …

被引用次数：48 相关文章所有 8 个版本

[PDF] thecvf.com

Super-resolution neural operator

M Wei, X Zhang - Proceedings of the IEEE/CVF Conference …, 2023 - openaccess.thecvf.com

Abstract We propose Super-resolution Neural Operator (SRNO), a deep operator learning
framework that can resolve high-resolution (HR) images at arbitrary scales from the low …

被引用次数：18 相关文章所有 6 个版本

[PDF] neurips.cc

Expediting large-scale vision transformer for dense prediction without fine-tuning

W Liang, Y Yuan, H Ding, X Luo… - Advances in …, 2022 - proceedings.neurips.cc

Vision transformers have recently achieved competitive results across various vision tasks
but still suffer from heavy computation costs when processing a large number of tokens …

被引用次数：16 相关文章所有 6 个版本

[PDF] arxiv.org

Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification

J Leinonen, U Hamann, D Nerini, U Germann… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models have been widely adopted in image generation, producing higher-quality
and more diverse samples than generative adversarial networks (GANs). We introduce a …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Pastnet: Introducing physical inductive biases for spatio-temporal video prediction

H Wu, W Xiong, F Xu, X Luo, C Chen, XS Hua… - arXiv preprint arXiv …, 2023 - arxiv.org

In this paper, we investigate the challenge of spatio-temporal video prediction, which
involves generating future videos based on historical data streams. Existing approaches …

被引用次数：16 相关文章所有 2 个版本

[PDF] arxiv.org

Koopman neural operator as a mesh-free solver of non-linear partial differential equations

W Xiong, X Huang, Z Zhang, R Deng, P Sun… - Journal of Computational …, 2024 - Elsevier

The lacking of analytic solutions of diverse partial differential equations (PDEs) gives birth to
a series of computational techniques for numerical solutions. Although numerous latest …

被引用次数：22 相关文章所有 4 个版本

[PDF] arxiv.org

Simba: Simplified mamba-based architecture for vision and multivariate time series

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2403.15360, 2024 - arxiv.org

Transformers have widely adopted attention networks for sequence mixing and MLPs for
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …

被引用次数：16 相关文章所有 2 个版本

[PDF] neurips.cc

Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

被引用次数：5 相关文章所有 7 个版本

[PDF] nsf.gov

Transformer Meets Boundary Value Inverse Problem

R Guo, S Cao - International Conference on Learning Representations, 2023 - par.nsf.gov

A Transformer-based deep direct sampling method is proposed for electrical impedance
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …

被引用次数：20 相关文章所有 5 个版本

[PDF] arxiv.org

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org

Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

被引用次数：18 相关文章所有 3 个版本