Advancing transformer architecture in long-context large language models: A comprehensive survey

Y Huang, J Xu, J Lai, Z Jiang, T Chen, Z Li… - arXiv preprint arXiv …, 2023 - arxiv.org
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …

Scalable transformer for pde surrogate modeling

Z Li, D Shu, A Barati Farimani - Advances in Neural …, 2024 - proceedings.neurips.cc
Transformer has shown state-of-the-art performance on various applications and has
recently emerged as a promising tool for surrogate modeling of partial differential equations …

Primal-attention: Self-attention through asymmetric kernel svd in primal representation

Y Chen, Q Tao, F Tonin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …

Simba: Simplified mamba-based architecture for vision and multivariate time series

BN Patro, VS Agneeswaran - arXiv preprint arXiv:2403.15360, 2024 - arxiv.org
Transformers have widely adopted attention networks for sequence mixing and MLPs for
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …

Scattering vision transformer: Spectral mixing matters

B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …

SpectFormer: Frequency and Attention is what you need in a Vision Transformer

BN Patro, VP Namboodiri, VS Agneeswaran - arXiv preprint arXiv …, 2023 - arxiv.org
Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …

Transformer meets boundary value inverse problems

R Guo, S Cao, L Chen - arXiv preprint arXiv:2209.14977, 2022 - arxiv.org
A Transformer-based deep direct sampling method is proposed for electrical impedance
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …

Transformers implement functional gradient descent to learn non-linear functions in context

X Cheng, Y Chen, S Sra - arXiv preprint arXiv:2312.06528, 2023 - arxiv.org
Many neural network architectures have been shown to be Turing Complete, and can thus
implement arbitrary algorithms. However, Transformers are unique in that they can …

Improving transformer with an admixture of attention heads

T Nguyen, T Nguyen, H Do, K Nguyen… - Advances in neural …, 2022 - proceedings.neurips.cc
Transformers with multi-head self-attention have achieved remarkable success in sequence
modeling and beyond. However, they suffer from high computational and memory …

[HTML][HTML] Multi-scale time-stepping of Partial Differential Equations with transformers

AP Hemmasian, AB Farimani - Computer Methods in Applied Mechanics …, 2024 - Elsevier
Developing fast surrogates for Partial Differential Equations (PDEs) will accelerate design
and optimization in almost all scientific and engineering applications. Neural networks have …