Understanding the reasoning ability of language models from the perspective of reasoning paths aggregation
Pre-trained language models (LMs) are able to perform complex reasoning without explicit
fine-tuning. To understand how pre-training with a next-token prediction objective …
fine-tuning. To understand how pre-training with a next-token prediction objective …
On provable length and compositional generalization
K Ahuja, A Mansouri - arXiv preprint arXiv:2402.04875, 2024 - arxiv.org
Length generalization--the ability to generalize to longer sequences than ones seen during
training, and compositional generalization--the ability to generalize to token combinations …
training, and compositional generalization--the ability to generalize to token combinations …
Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers
Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to
generalize to longer sequences than those encountered during training. To tackle this …
generalize to longer sequences than those encountered during training. To tackle this …
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers
MR Ebrahimi, S Panchal, R Memisevic - arXiv preprint arXiv:2408.05506, 2024 - arxiv.org
Despite their recent successes, Transformer-based large language models show surprising
failure modes. A well-known example of such failure modes is their inability to length …
failure modes. A well-known example of such failure modes is their inability to length …