Broken Neural Scaling Laws E Caballero, K Gupta, I Rish, D Krueger arXiv preprint arXiv:2210.14891, 2022 | 50 | 2022 |
Illinimet: Illinois system for metaphor detection with contextual and linguistic information H Gong, K Gupta, A Jain, S Bhat Proceedings of the Second Workshop on Figurative Language Processing, 146-153, 2020 | 44 | 2020 |
ARB: Advanced Reasoning Benchmark for Large Language Models T Sawada, D Paleka, A Havrilla, P Tadepalli, P Vidas, A Kranias, JJ Nay, ... arXiv preprint arXiv:2307.13692, 2023 | 31 | 2023 |
Continual Pre-Training of Large Language Models: How to (re) warm your model? K Gupta*, B Thérien*, A Ibrahim*, ML Richter, Q Anthony, E Belilovsky, ... arXiv preprint arXiv:2308.04014, 2023 | 28 | 2023 |
Temporal latent bottleneck: Synthesis of fast and slow processing mechanisms in sequence learning A Didolkar, K Gupta, A Goyal, NB Gundavarapu, AM Lamb, NR Ke, ... Advances in Neural Information Processing Systems 35, 10505-10520, 2022 | 9 | 2022 |
Simple and Scalable Strategies to Continually Pre-train Large Language Models A Ibrahim*, B Thérien*, K Gupta*, ML Richter, Q Anthony, T Lesort, ... arXiv preprint arXiv:2403.08763, 2024 | 7 | 2024 |
Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the US Executive Order T Nakamura, M Mishra, S Tedeschi, Y Chai, JT Stillerman, F Friedrich, ... arXiv preprint arXiv:2404.00399, 2024 | 2 | 2024 |