Hierarchically gated recurrent neural network for sequence modeling
Transformers have surpassed RNNs in popularity due to their superior abilities in parallel
training and long-term dependency modeling. Recently, there has been a renewed interest …
training and long-term dependency modeling. Recently, there has been a renewed interest …
Griffin: Mixing gated linear recurrences with local attention for efficient language models
Recurrent neural networks (RNNs) have fast inference and scale efficiently on long
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …
sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with …
Convolutional state space models for long-range spatiotemporal modeling
Effectively modeling long spatiotemporal sequences is challenging due to the need to model
complex spatial correlations and long-range temporal dependencies simultaneously …
complex spatial correlations and long-range temporal dependencies simultaneously …
Gated recurrent neural networks discover attention
N Zucchet, S Kobayashi, Y Akram… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent architectural developments have enabled recurrent neural networks (RNNs) to reach
and even surpass the performance of Transformers on certain sequence modeling tasks …
and even surpass the performance of Transformers on certain sequence modeling tasks …
Recurrent Distance Filtering for Graph Representation Learning
Graph neural networks based on iterative one-hop message passing have been shown to
struggle in harnessing the information from distant nodes effectively. Conversely, graph …
struggle in harnessing the information from distant nodes effectively. Conversely, graph …
MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map
Various linear complexity models, such as Linear Transformer (LinFormer), State Space
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …
Mamba-fscil: Dynamic adaptation with selective state space model for few-shot class-incremental learning
Few-shot class-incremental learning (FSCIL) confronts the challenge of integrating new
classes into a model with minimal training samples while preserving the knowledge of …
classes into a model with minimal training samples while preserving the knowledge of …
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks
Softmax attention is the principle backbone of foundation models for various artificial
intelligence applications, yet its quadratic complexity in sequence length can limit its …
intelligence applications, yet its quadratic complexity in sequence length can limit its …
On the low-shot transferability of [V]-Mamba
The strength of modern large-scale neural networks lies in their ability to efficiently adapt to
new tasks with few examples. Although extensive research has investigated the …
new tasks with few examples. Although extensive research has investigated the …
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Linear Recurrent Neural Networks (LRNNs) such as Mamba, RWKV, GLA, mLSTM, and
DeltaNet have emerged as efficient alternatives to Transformers in large language …
DeltaNet have emerged as efficient alternatives to Transformers in large language …