Hierarchically gated recurrent neural network for sequence modeling

Z Qin, S Yang, Y Zhong - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …

Griffin: Mixing gated linear recurrences with local attention for efficient language models

S De, SL Smith, A Fernando, A Botev… - arXiv preprint arXiv …, 2024 - arxiv.org
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …

Convolutional state space models for long-range spatiotemporal modeling

J Smith, S De Mello, J Kautz… - Advances in Neural …, 2024 - proceedings.neurips.cc
Effectively modeling long spatiotemporal sequences is challenging due to the need to model
complex spatial correlations and long-range temporal dependencies simultaneously …

Gated recurrent neural networks discover attention

N Zucchet, S Kobayashi, Y Akram… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent architectural developments have enabled recurrent neural networks (RNNs) to reach
and even surpass the performance of Transformers on certain sequence modeling tasks …

Recurrent Distance Filtering for Graph Representation Learning

Y Ding, A Orvieto, B He, T Hofmann - Forty-first International …, 2024 - openreview.net
Graph neural networks based on iterative one-hop message passing have been shown to
struggle in harnessing the information from distant nodes effectively. Conversely, graph …

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Y Chou, M Yao, K Wang, Y Pan, R Zhu, Y Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
Various linear complexity models, such as Linear Transformer (LinFormer), State Space
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …

Mamba-fscil: Dynamic adaptation with selective state space model for few-shot class-incremental learning

X Li, Y Yang, J Wu, B Ghanem, L Nie… - arXiv preprint arXiv …, 2024 - arxiv.org
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new
classes into a model with minimal training samples while preserving the knowledge of …

Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks

J Sieber, CA Alonso, A Didier, MN Zeilinger… - arXiv preprint arXiv …, 2024 - arxiv.org
Softmax attention is the principle backbone of foundation models for various artificial
intelligence applications, yet its quadratic complexity in sequence length can limit its …

On the low-shot transferability of [V]-Mamba

D Misra, J Gala, A Orvieto - arXiv preprint arXiv:2403.10696, 2024 - arxiv.org
The strength of modern large-scale neural networks lies in their ability to efficiently adapt to
new tasks with few examples. Although extensive research has investigated the …

Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues

R Grazzi, J Siems, JKH Franke, A Zela, F Hutter… - arXiv preprint arXiv …, 2024 - arxiv.org
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and
DeltaNet have emerged as efficient alternatives to Transformers in large language …