Implicit kernel attention

S Cao - Advances in neural information processing systems, 2021 - proceedings.neurips.cc

In this paper, we apply the self-attention from the state-of-the-art Transformer in Attention Is
All You Need for the first time to a data-driven operator learning problem related to partial …

被引用次数：179 相关文章所有 8 个版本

[PDF] mdpi.com

Depression detection in speech using transformer and parallel convolutional neural networks

F Yin, J Du, X Xu, L Zhao - Electronics, 2023 - mdpi.com

As a common mental disorder, depression becomes a major threat to human health and
may even heavily influence one's daily life. Considering this background, it is necessary to …

被引用次数：27 相关文章所有 3 个版本

[PDF] neurips.cc

Fmmformer: Efficient and flexible transformer via decomposed near-field and far-field attention

T Nguyen, V Suliafu, S Osher… - Advances in neural …, 2021 - proceedings.neurips.cc

We propose FMMformers, a class of efficient and flexible transformers inspired by the
celebrated fast multipole method (FMM) for accelerating interacting particle simulation. FMM …

被引用次数：26 相关文章所有 9 个版本

[PDF] arxiv.org

Uniform memory retrieval with larger capacity for modern hopfield models

D Wu, JYC Hu, TY Hsiao, H Liu - arXiv preprint arXiv:2404.03827, 2024 - arxiv.org

We propose a two-stage memory retrieval dynamics for modern Hopfield models, termed
$\mathtt {U\text {-} Hop} $, with enhanced memory capacity. Our key contribution is a …

被引用次数：11 相关文章所有 3 个版本

[PDF] mlr.press

Momentum transformer: Closing the performance gap between self-attention and its linearization

TM Nguyen, R Baraniuk, R Kirby… - Mathematical and …, 2022 - proceedings.mlr.press

Transformers have achieved remarkable success in sequence modeling and beyond but
suffer from quadratic computational and memory complexities with respect to the length of …

被引用次数：7 相关文章所有 5 个版本

[PDF] arxiv.org

Attention as Robust Representation for Time Series Forecasting

PS Niu, T Zhou, X Wang, L Sun, R Jin - arXiv preprint arXiv:2402.05370, 2024 - arxiv.org

Time series forecasting is essential for many practical applications, with the adoption of
transformer-based models on the rise due to their impressive performance in NLP and CV …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

How does momentum benefit deep neural networks architecture design? a few case studies

B Wang, H Xia, T Nguyen, S Osher - Research in the Mathematical …, 2022 - Springer

We present and review an algorithmic and theoretical framework for improving neural
network architecture design via momentum. As case studies, we consider how momentum …

被引用次数：9 相关文章所有 8 个版本

[PDF] arxiv.org

Spectraformer: A Unified Random Feature Framework for Transformer

D Nguyen, A Joshi, F Salim - arXiv preprint arXiv:2405.15310, 2024 - arxiv.org

Linearization of attention using various kernel approximation and kernel learning techniques
has shown promise. Past methods use a subset of combinations of component functions and …

被引用次数：1 相关文章所有 2 个版本

[PDF] um.ac.ir

Meta-Learning for Medium-shot Sparse Learning via Deep Kernels

Z Adabi Firuzjaee, SK Ghiasi-Shirazi - Computer and Knowledge …, 2022 - cke.um.ac.ir

Few-shot learning assumes that we have a very small dataset for each task and trains a
model on the set of tasks. For real-world problems, however, the amount of available data is …

BiXT: Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

M Hiller, KA Ehinger, T Drummond - openreview.net

We present a novel bi-directional Transformer architecture (BiXT) for which computational
cost and memory consumption scale linearly with input size, but without suffering the drop in …

Choose a transformer: Fourier or galerkin

Depression detection in speech using transformer and parallel convolutional neural networks

Fmmformer: Efficient and flexible transformer via decomposed near-field and far-field attention

Uniform memory retrieval with larger capacity for modern hopfield models

Momentum transformer: Closing the performance gap between self-attention and its linearization

Attention as Robust Representation for Time Series Forecasting

How does momentum benefit deep neural networks architecture design? a few case studies

Spectraformer: A Unified Random Feature Framework for Transformer

Meta-Learning for Medium-shot Sparse Learning via Deep Kernels

BiXT: Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

高级搜索

引用