Position Coupling: Leveraging Task Structure for Improved Length Generalization of Transformers

H Cho, J Cha, P Awasthi, S Bhojanapalli… - arXiv preprint arXiv …, 2024 - arxiv.org
Even for simple arithmetic tasks like integer addition, it is challenging for Transformers to
generalize to longer sequences than those encountered during training. To tackle this …