Advancing transformer architecture in long-context large language models: A comprehensive survey
With the bomb ignited by ChatGPT, Transformer-based Large Language Models (LLMs)
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
have paved a revolutionary path toward Artificial General Intelligence (AGI) and have been …
Scalable transformer for pde surrogate modeling
Transformer has shown state-of-the-art performance on various applications and has
recently emerged as a promising tool for surrogate modeling of partial differential equations …
recently emerged as a promising tool for surrogate modeling of partial differential equations …
Primal-attention: Self-attention through asymmetric kernel svd in primal representation
Recently, a new line of works has emerged to understand and improve self-attention in
Transformers by treating it as a kernel machine. However, existing works apply the methods …
Transformers by treating it as a kernel machine. However, existing works apply the methods …
Simba: Simplified mamba-based architecture for vision and multivariate time series
BN Patro, VS Agneeswaran - arXiv preprint arXiv:2403.15360, 2024 - arxiv.org
Transformers have widely adopted attention networks for sequence mixing and MLPs for
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …
channel mixing, playing a pivotal role in achieving breakthroughs across domains. However …
Scattering vision transformer: Spectral mixing matters
B Patro, V Agneeswaran - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Vision transformers have gained significant attention and achieved state-of-the-art
performance in various computer vision tasks, including image classification, instance …
performance in various computer vision tasks, including image classification, instance …
SpectFormer: Frequency and Attention is what you need in a Vision Transformer
Vision transformers have been applied successfully for image recognition tasks. There have
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …
been either multi-headed self-attention based (ViT\cite {dosovitskiy2020image}, DeIT,\cite …
Transformer meets boundary value inverse problems
A Transformer-based deep direct sampling method is proposed for electrical impedance
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …
tomography, a well-known severely ill-posed nonlinear boundary value inverse problem. A …
Transformers implement functional gradient descent to learn non-linear functions in context
Many neural network architectures have been shown to be Turing Complete, and can thus
implement arbitrary algorithms. However, Transformers are unique in that they can …
implement arbitrary algorithms. However, Transformers are unique in that they can …
Improving transformer with an admixture of attention heads
Transformers with multi-head self-attention have achieved remarkable success in sequence
modeling and beyond. However, they suffer from high computational and memory …
modeling and beyond. However, they suffer from high computational and memory …
[HTML][HTML] Multi-scale time-stepping of Partial Differential Equations with transformers
AP Hemmasian, AB Farimani - Computer Methods in Applied Mechanics …, 2024 - Elsevier
Developing fast surrogates for Partial Differential Equations (PDEs) will accelerate design
and optimization in almost all scientific and engineering applications. Neural networks have …
and optimization in almost all scientific and engineering applications. Neural networks have …