Gqa: Training generalized multi-query transformer models from multi-head checkpoints J Ainslie, J Lee-Thorp, M de Jong, Y Zemlyanskiy, F Lebrón, S Sanghai arXiv preprint arXiv:2305.13245, 2023 | 170 | 2023 |
Colt5: Faster long-range transformers with conditional computation J Ainslie, T Lei, M de Jong, S Ontañón, S Brahma, Y Zemlyanskiy, ... arXiv preprint arXiv:2303.09752, 2023 | 44 | 2023 |
Towards a robust interactive and learning social robot M De Jong, K Zhang, AM Roth, T Rhodes, R Schmucker, C Zhou, ... Proceedings of the 17th International Conference on Autonomous Agents and …, 2018 | 43 | 2018 |
Mention memory: incorporating textual knowledge into transformers through entity mention attention M De Jong, Y Zemlyanskiy, N FitzGerald, F Sha, W Cohen arXiv preprint arXiv:2110.06176, 2021 | 40 | 2021 |
Augmenting pre-trained language models with qa-memory for open-domain question answering W Chen, P Verga, M De Jong, J Wieting, W Cohen arXiv preprint arXiv:2204.04581, 2022 | 23 | 2022 |
Fido: Fusion-in-decoder optimized for stronger performance and faster inference M de Jong, Y Zemlyanskiy, J Ainslie, N FitzGerald, S Sanghai, F Sha, ... arXiv preprint arXiv:2212.08153, 2022 | 19 | 2022 |
ReadTwice: Reading Very Large Documents with Memories Y Zemlyanskiy, J Ainslie, M de Jong, P Pham, I Eckstein, F Sha arXiv preprint arXiv:2105.04241, 2021 | 13 | 2021 |
Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing Y Zemlyanskiy, M de Jong, J Ainslie, P Pasupat, P Shaw, L Qiu, ... arXiv preprint arXiv:2209.14899, 2022 | 11 | 2022 |
Qa is the new kr: Question-answer pairs as knowledge bases WW Cohen, W Chen, M De Jong, N Gupta, A Presta, P Verga, J Wieting Proceedings of the AAAI Conference on Artificial Intelligence 37 (13), 15385 …, 2023 | 5 | 2023 |
Pre-computed memory or on-the-fly encoding? A hybrid approach to retrieval augmentation makes the most of your compute M De Jong, Y Zemlyanskiy, N FitzGerald, J Ainslie, S Sanghai, F Sha, ... International Conference on Machine Learning, 7329-7342, 2023 | 4 | 2023 |
Neural theorem provers do not learn rules without exploration M de Jong, F Sha arXiv preprint arXiv:1906.06805, 2019 | 3 | 2019 |
GLIMMER: generalized late-interaction memory reranker M de Jong, Y Zemlyanskiy, N FitzGerald, S Sanghai, WW Cohen, J Ainslie arXiv preprint arXiv:2306.10231, 2023 | 2 | 2023 |
MEMORY-VQ: Compression for Tractable Internet-Scale Memory Y Zemlyanskiy, M de Jong, L Vilnis, S Ontañón, WW Cohen, S Sanghai, ... arXiv preprint arXiv:2308.14903, 2023 | | 2023 |
Generate-and-Retrieve: use your predictions to improve retrieval for semantic parsing F Sha, I Pasupat, J Ainslie, P Shaw, SK Sanghai, Y Zemlyanskiy, L Qiu, ... | | 2022 |
Weighted Global Normalization for Multiple Choice Reading Comprehension over Long Documents A Chaudhary, B Paranjape, M de Jong arXiv preprint arXiv:1812.02253, 2018 | | 2018 |