- 学术资源搜索

Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

被引用次数：57 相关文章所有 5 个版本

[PDF] arxiv.org

Griffin: Mixing gated linear recurrences with local attention for efficient language models

S De, SL Smith, A Fernando, A Botev… - arXiv preprint arXiv …, 2024 - arxiv.org

Recurrent neural networks (RNNs) have fast inference and scale efficiently on long
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …

被引用次数：82 相关文章所有 2 个版本

[PDF] neurips.cc

Convolutional state space models for long-range spatiotemporal modeling

J Smith, S De Mello, J Kautz… - Advances in Neural …, 2024 - proceedings.neurips.cc

Effectively modeling long spatiotemporal sequences is challenging due to the need to model
complex spatial correlations and long-range temporal dependencies simultaneously …

被引用次数：16 相关文章所有 7 个版本

[PDF] arxiv.org

Gated recurrent neural networks discover attention

N Zucchet, S Kobayashi, Y Akram… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent architectural developments have enabled recurrent neural networks (RNNs) to reach
and even surpass the performance of Transformers on certain sequence modeling tasks …

被引用次数：9 相关文章所有 4 个版本

[PDF] openreview.net

Recurrent Distance Filtering for Graph Representation Learning

Y Ding, A Orvieto, B He, T Hofmann - Forty-first International …, 2024 - openreview.net

Graph neural networks based on iterative one-hop message passing have been shown to
struggle in harnessing the information from distant nodes effectively. Conversely, graph …

被引用次数：3 相关文章

[PDF] arxiv.org

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Y Chou, M Yao, K Wang, Y Pan, R Zhu, Y Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org

Various linear complexity models, such as Linear Transformer (LinFormer), State Space
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Mamba-fscil: Dynamic adaptation with selective state space model for few-shot class-incremental learning

X Li, Y Yang, J Wu, B Ghanem, L Nie… - arXiv preprint arXiv …, 2024 - arxiv.org

Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new
classes into a model with minimal training samples while preserving the knowledge of …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

J Sieber, CA Alonso, A Didier, MN Zeilinger… - arXiv preprint arXiv …, 2024 - arxiv.org

Softmax attention is the principle backbone of foundation models for various artificial
intelligence applications, yet its quadratic complexity in sequence length can limit its …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

On the low-shot transferability of [V]-Mamba

D Misra, J Gala, A Orvieto - arXiv preprint arXiv:2403.10696, 2024 - arxiv.org

The strength of modern large-scale neural networks lies in their ability to efficiently adapt to
new tasks with few examples. Although extensive research has investigated the …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

R Grazzi, J Siems, JKH Franke, A Zela, F Hutter… - arXiv preprint arXiv …, 2024 - arxiv.org

Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and
DeltaNet have emerged as efficient alternatives to Transformers in large language …

被引用次数：1 相关文章所有 3 个版本