Understanding the reasoning ability of language models from the perspective of reasoning paths aggregation

X Wang, A Amayuelas, K Zhang, L Pan, W Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Pre-trained language models (LMs) are able to perform complex reasoning without explicit
fine-tuning. To understand how pre-training with a next-token prediction objective …

On provable length and compositional generalization

K Ahuja, A Mansouri - arXiv preprint arXiv:2402.04875, 2024 - arxiv.org
Length generalization--the ability to generalize to longer sequences than ones seen during
training, and compositional generalization--the ability to generalize to token combinations …

Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

H Cho, J Cha, P Awasthi, S Bhojanapalli… - arXiv preprint arXiv …, 2024 - arxiv.org
Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to
generalize to longer sequences than those encountered during training. To tackle this …

Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers

MR Ebrahimi, S Panchal, R Memisevic - arXiv preprint arXiv:2408.05506, 2024 - arxiv.org
Despite their recent successes, Transformer-based large language models show surprising
failure modes. A well-known example of such failure modes is their inability to length …