Fourierformer: Transformer meets generalized fourier integral theorem

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org

With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

被引用次数：32 相关文章所有 2 个版本

[PDF] neurips.cc

Scalable transformer for pde surrogate modeling

Z Li, D Shu, A Barati Farimani - Advances in Neural …, 2024 - proceedings.neurips.cc

Transformer has shown state-of-the-art performance on various applications and has
recently emerged as a promising tool for surrogate modeling of partial differential equations …

被引用次数：44 相关文章所有 5 个版本

[PDF] neurips.cc

Primal-attention: Self-attention through asymmetric kernel svd in primal representation

Y Chen, Q Tao, F Tonin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …

被引用次数：24 相关文章所有 7 个版本

[PDF] arxiv.org

Simba: Simplified mamba-based architecture for vision and multivariate time series

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2403.15360, 2024 - arxiv.org

Transformers have widely adopted attention networks for sequence mixing and MLPs for
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …

被引用次数：51 相关文章所有 2 个版本

[PDF] neurips.cc

Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

被引用次数：11 相关文章所有 7 个版本

[PDF] arxiv.org

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org

Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

被引用次数：46 相关文章所有 3 个版本

[PDF] arxiv.org

Transformer meets boundary value inverse problems

R Guo, S Cao, L Chen - arXiv preprint arXiv:2209.14977, 2022 - arxiv.org

A Transformer-based deep direct sampling method is proposed for electrical impedance
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …

被引用次数：24 相关文章所有 4 个版本

[PDF] arxiv.org

Transformers implement functional gradient descent to learn non-linear functions in context

X Cheng, Y Chen, S Sra - arXiv preprint arXiv:2312.06528, 2023 - arxiv.org

Many neural network architectures have been shown to be Turing Complete, and can thus
implement arbitrary algorithms. However, Transformers are unique in that they can …

被引用次数：34 相关文章所有 3 个版本

[PDF] neurips.cc

Improving transformer with an admixture of attention heads

T Nguyen, T Nguyen, H Do, K Nguyen… - Advances in neural …, 2022 - proceedings.neurips.cc

Transformers with multi-head self-attention have achieved remarkable success in sequence
modeling and beyond. However, they suffer from high computational and memory …

被引用次数：23 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Multi-scale time-stepping of Partial Differential Equations with transformers

AP Hemmasian, AB Farimani - Computer Methods in Applied Mechanics …, 2024 - Elsevier

Developing fast surrogates for Partial Differential Equations (PDEs) will accelerate design
and optimization in almost all scientific and engineering applications. Neural networks have …

被引用次数：7 相关文章所有 4 个版本