A theoretical understanding of shallow vision transformers: Learning, generalization, and sample complexity

H Li, M Wang, S Liu, PY Chen - arXiv preprint arXiv:2302.06015, 2023 - arxiv.org
Vision Transformers (ViTs) with self-attention modules have recently achieved great
empirical success in many vision tasks. Due to non-convex interactions across layers …

Super-resolution neural operator

M Wei, X Zhang - Proceedings of the IEEE/CVF Conference …, 2023 - openaccess.thecvf.com
Abstract We propose Super-resolution Neural Operator (SRNO), a deep operator learning
framework that can resolve high-resolution (HR) images at arbitrary scales from the low …

Expediting large-scale vision transformer for dense prediction without fine-tuning

W Liang, Y Yuan, H Ding, X Luo… - Advances in …, 2022 - proceedings.neurips.cc
Vision transformers have recently achieved competitive results across various vision tasks
but still suffer from heavy computation costs when processing a large number of tokens …

Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification

J Leinonen, U Hamann, D Nerini, U Germann… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have been widely adopted in image generation, producing higher-quality
and more diverse samples than generative adversarial networks (GANs). We introduce a …

Pastnet: Introducing physical inductive biases for spatio-temporal video prediction

H Wu, W Xiong, F Xu, X Luo, C Chen, XS Hua… - arXiv preprint arXiv …, 2023 - arxiv.org
In this paper, we investigate the challenge of spatio-temporal video prediction, which
involves generating future videos based on historical data streams. Existing approaches …

Koopman neural operator as a mesh-free solver of non-linear partial differential equations

W Xiong, X Huang, Z Zhang, R Deng, P Sun… - Journal of Computational …, 2024 - Elsevier
The lacking of analytic solutions of diverse partial differential equations (PDEs) gives birth to
a series of computational techniques for numerical solutions. Although numerous latest …

Simba: Simplified mamba-based architecture for vision and multivariate time series

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2403.15360, 2024 - arxiv.org
Transformers have widely adopted attention networks for sequence mixing and MLPs for
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …

Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

Transformer Meets Boundary Value Inverse Problem

R Guo, S Cao - International Conference on Learning Representations, 2023 - par.nsf.gov
A Transformer-based deep direct sampling method is proposed for electrical impedance
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org
Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …