Choose a transformer: Fourier or galerkin
S Cao - Advances in neural information processing systems, 2021 - proceedings.neurips.cc
In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is
All You Need for the first time to a data-driven operator learning problem related to partial …
All You Need for the first time to a data-driven operator learning problem related to partial …
Depression detection in speech using transformer and parallel convolutional neural networks
F Yin, J Du, X Xu, L Zhao - Electronics, 2023 - mdpi.com
As a common mental disorder, depression becomes a major threat to human health and
may even heavily influence one's daily life. Considering this background, it is necessary to …
may even heavily influence one's daily life. Considering this background, it is necessary to …
Fmmformer: Efficient and flexible transformer via decomposed near-field and far-field attention
We propose FMMformers, a class of efficient and flexible transformers inspired by the
celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM …
celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM …
Uniform memory retrieval with larger capacity for modern hopfield models
We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …
Momentum transformer: Closing the performance gap between self-attention and its linearization
Transformers have achieved remarkable success in sequence modeling and beyond but
suffer from quadratic computational and memory complexities with respect to the length of …
suffer from quadratic computational and memory complexities with respect to the length of …
Attention as Robust Representation for Time Series Forecasting
Time series forecasting is essential for many practical applications, with the adoption of
transformer-based models on the rise due to their impressive performance in NLP and CV …
transformer-based models on the rise due to their impressive performance in NLP and CV …
How does momentum benefit deep neural networks architecture design? a few case studies
We present and review an algorithmic and theoretical framework for improving neural
network architecture design via momentum. As case studies, we consider how momentum …
network architecture design via momentum. As case studies, we consider how momentum …
Spectraformer: A Unified Random Feature Framework for Transformer
Linearization of attention using various kernel approximation and kernel learning techniques
has shown promise. Past methods use a subset of combinations of component functions and …
has shown promise. Past methods use a subset of combinations of component functions and …
Meta-Learning for Medium-shot Sparse Learning via Deep Kernels
Z Adabi Firuzjaee, SK Ghiasi-Shirazi - Computer and Knowledge …, 2022 - cke.um.ac.ir
Few-shot learning assumes that we have a very small dataset for each task and trains a
model on the set of tasks. For real-world problems, however, the amount of available data is …
model on the set of tasks. For real-world problems, however, the amount of available data is …
BiXT: Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
We present a novel bi-directional Transformer architecture (BiXT) for which computational
cost and memory consumption scale linearly with input size, but without suffering the drop in …
cost and memory consumption scale linearly with input size, but without suffering the drop in …