Choose a transformer: Fourier or galerkin

S Cao - Advances in neural information processing systems, 2021 - proceedings.neurips.cc
In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is
All You Need for the first time to a data-driven operator learning problem related to partial …

Depression detection in speech using transformer and parallel convolutional neural networks

F Yin, J Du, X Xu, L Zhao - Electronics, 2023 - mdpi.com
As a common mental disorder, depression becomes a major threat to human health and
may even heavily influence one's daily life. Considering this background, it is necessary to …

Fmmformer: Efficient and flexible transformer via decomposed near-field and far-field attention

T Nguyen, V Suliafu, S Osher… - Advances in neural …, 2021 - proceedings.neurips.cc
We propose FMMformers, a class of efficient and flexible transformers inspired by the
celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM …

Uniform memory retrieval with larger capacity for modern hopfield models

D Wu, JYC Hu, TY Hsiao, H Liu - arXiv preprint arXiv:2404.03827, 2024 - arxiv.org
We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …

Momentum transformer: Closing the performance gap between self-attention and its linearization

TM Nguyen, R Baraniuk, R Kirby… - Mathematical and …, 2022 - proceedings.mlr.press
Transformers have achieved remarkable success in sequence modeling and beyond but
suffer from quadratic computational and memory complexities with respect to the length of …

Attention as Robust Representation for Time Series Forecasting

PS Niu, T Zhou, X Wang, L Sun, R Jin - arXiv preprint arXiv:2402.05370, 2024 - arxiv.org
Time series forecasting is essential for many practical applications, with the adoption of
transformer-based models on the rise due to their impressive performance in NLP and CV …

How does momentum benefit deep neural networks architecture design? a few case studies

B Wang, H Xia, T Nguyen, S Osher - Research in the Mathematical …, 2022 - Springer
We present and review an algorithmic and theoretical framework for improving neural
network architecture design via momentum. As case studies, we consider how momentum …

Spectraformer: A Unified Random Feature Framework for Transformer

D Nguyen, A Joshi, F Salim - arXiv preprint arXiv:2405.15310, 2024 - arxiv.org
Linearization of attention using various kernel approximation and kernel learning techniques
has shown promise. Past methods use a subset of combinations of component functions and …

Meta-Learning for Medium-shot Sparse Learning via Deep Kernels

Z Adabi Firuzjaee, SK Ghiasi-Shirazi - Computer and Knowledge …, 2022 - cke.um.ac.ir
Few-shot learning assumes that we have a very small dataset for each task and trains a
model on the set of tasks. For real-world problems, however, the amount of available data is …

BiXT: Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

M Hiller, KA Ehinger, T Drummond - openreview.net
We present a novel bi-directional Transformer architecture (BiXT) for which computational
cost and memory consumption scale linearly with input size, but without suffering the drop in …