Solving quantitative reasoning problems with language models A Lewkowycz, A Andreassen, D Dohan, E Dyer, H Michalewski, ... Advances in Neural Information Processing Systems 35, 3843-3857, 2022 | 447 | 2022 |
Linear Transformers are Secretly Fast Weight Programmers I Schlag*, K Irie*, J Schmidhuber International Conference on Machine Learning, 9355-9366, 2021 | 197* | 2021 |
Block-Recurrent Transformers DL Hutchins*, I Schlag*, Y Wu, E Dyer, B Neyshabur arXiv preprint arXiv:2203.07852, 2022 | 94 | 2022 |
Learning to reason with third order tensor products I Schlag, J Schmidhuber Advances in neural information processing systems 31, 9981-9993, 2018 | 79 | 2018 |
Enhancing the transformer with explicit relational encoding for math problem solving I Schlag, P Smolensky, R Fernandez, N Jojic, J Schmidhuber, J Gao arXiv preprint arXiv:1910.06611, 2019 | 70 | 2019 |
Going beyond linear transformers with recurrent fast weight programmers K Irie*, I Schlag*, R Csordás, J Schmidhuber Advances in Neural Information Processing Systems 34, 2021 | 59 | 2021 |
Mindstorms in Natural Language-Based Societies of Mind M Zhuge, H Liu, F Faccio, DR Ashley, R Csordás, A Gopalakrishnan, ... arXiv preprint arXiv:2305.17066, 2023 | 42 | 2023 |
Ancient Roman coin recognition in the wild using deep learning based recognition of artistically depicted face profiles I Schlag, O Arandjelovic Proceedings of the IEEE International Conference on Computer Vision …, 2017 | 42 | 2017 |
Learning Associative Inference Using Fast Weight Memory I Schlag, T Munkhdalai, J Schmidhuber International Conference on Learning Representations, 2021 | 41 | 2021 |
A Modern Self-Referential Weight Matrix That Learns to Modify Itself K Irie, I Schlag, R Csordás, J Schmidhuber Deep RL Workshop NeurIPS 2021, 2021 | 32 | 2021 |
Solving quantitative reasoning problems with language models, 2022 A Lewkowycz, A Andreassen, D Dohan, E Dyer, H Michalewski, ... URL https://arxiv. org/abs/2206.14858, 0 | 32* | |
Gated fast weights for on-the-fly neural program generation I Schlag, J Schmidhuber NIPS Metalearning Workshop, 2017 | 31 | 2017 |
Large Language Model Programs I Schlag, S Sukhbaatar, A Celikyilmaz, W Yih, J Weston, J Schmidhuber, ... arXiv preprint arXiv:2305.05364, 2023 | 12 | 2023 |
Block-recurrent transformers (2022) DL Hutchins, I Schlag, Y Wu, E Dyer, B Neyshabur URL https://arxiv. org/abs/2203.07852, 0 | 4 | |
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute A Stanić, D Ashley, O Serikov, L Kirsch, F Faccio, J Schmidhuber, ... arXiv preprint arXiv:2309.11197, 2023 | 3 | 2023 |
Navigating Scaling Laws: Accelerating Vision Transformer's Training via Adaptive Strategies S Anagnostidis, G Bachmann, I Schlag, T Hofmann arXiv preprint arXiv:2311.03233, 2024 | 2 | 2024 |
Improving Baselines in the Wild K Irie, I Schlag, R Csordás, J Schmidhuber NeurIPS 2021 Workshop on Distribution Shifts: Connecting Methods and …, 2021 | 2 | 2021 |
Augmenting Classic Algorithms with Neural Components for Strong Generalisation on Ambiguous and High-Dimensional Data I Schlag, J Schmidhuber Advances in Programming Languages and Neurosymbolic Systems Workshop, 2021 | 1 | 2021 |
Understanding and Minimising Outlier Features in Neural Network Training B He, L Noci, D Paliotta, I Schlag, T Hofmann arXiv preprint arXiv:2405.19279, 2024 | | 2024 |
Language Imbalance Can Boost Cross-lingual Generalisation A Schäfer, S Ravfogel, T Hofmann, T Pimentel, I Schlag arXiv preprint arXiv:2404.07982, 2024 | | 2024 |