Flashattention: Fast and memory-efficient exact attention with io-awareness T Dao, D Fu, S Ermon, A Rudra, C Ré Advances in Neural Information Processing Systems 35, 16344-16359, 2022 | 957 | 2022 |
Mamba: Linear-time sequence modeling with selective state spaces A Gu, T Dao Conference on Language Modeling (COLM), 2023 | 554 | 2023 |
Starcoder: may the source be with you! R Li, LB Allal, Y Zi, N Muennighoff, D Kocetkov, C Mou, M Marone, C Akiki, ... Transactions on Machine Learning Research (TMLR), 2023 | 550* | 2023 |
Flashattention-2: Faster attention with better parallelism and work partitioning T Dao International Conference on Learning Representations, 2023 | 304 | 2023 |
Hippo: Recurrent memory with optimal polynomial projections A Gu, T Dao, S Ermon, A Rudra, C Ré Advances in neural information processing systems 33, 1474-1487, 2020 | 261 | 2020 |
Combining recurrent, convolutional, and continuous-time models with linear state space layers A Gu, I Johnson, K Goel, K Saab, T Dao, A Rudra, C Ré Advances in neural information processing systems 34, 572-585, 2021 | 251 | 2021 |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models DY Fu, T Dao, KK Saab, AW Thomas, A Rudra, C Re The Eleventh International Conference on Learning Representations, 2023 | 231 | 2023 |
A kernel theory of modern data augmentation T Dao, A Gu, A Ratner, V Smith, CD Sa, C Ré Proceedings of the 36th International Conference on Machine Learning, ICML, 9-15, 2019 | 205 | 2019 |
Hyena Hierarchy: Towards Larger Convolutional Language Models M Poli, S Massaroli, E Nguyen, DY Fu, T Dao, S Baccus, Y Bengio, ... International Conference on Machine Learning, 2023 | 172 | 2023 |
Deja vu: Contextual sparsity for efficient llms at inference time Z Liu, J Wang, T Dao, T Zhou, B Yuan, Z Song, A Shrivastava, C Zhang, ... International Conference on Machine Learning, 22137-22176, 2023 | 118 | 2023 |
S4nd: Modeling images and videos as multidimensional signals with state spaces E Nguyen, K Goel, A Gu, G Downs, P Shah, T Dao, S Baccus, C Ré Advances in neural information processing systems 35, 2846-2861, 2022 | 110 | 2022 |
Learning fast algorithms for linear transforms using butterfly factorizations T Dao, A Gu, M Eichhorn, A Rudra, C Ré International conference on machine learning, 1517-1527, 2019 | 106 | 2019 |
Scatterbrain: Unifying sparse and low-rank attention B Chen, T Dao, E Winsor, Z Song, A Rudra, C Ré Advances in Neural Information Processing Systems 34, 17413-17426, 2021 | 98 | 2021 |
Monarch: Expressive structured matrices for efficient and accurate training T Dao, B Chen, NS Sohoni, A Desai, M Poli, J Grogan, A Liu, A Rao, ... International Conference on Machine Learning, 4690-4721, 2022 | 71 | 2022 |
Mongoose: A learnable lsh framework for efficient neural network training B Chen, Z Liu, B Peng, Z Xu, JL Li, T Dao, Z Song, A Shrivastava, C Re International Conference on Learning Representations, 2020 | 71 | 2020 |
Pixelated butterfly: Simple and efficient sparse training for neural network models T Dao, B Chen, K Liang, J Yang, Z Song, A Rudra, C Re International Conference on Learning Representations, 2021 | 65 | 2021 |
Decentralized training of foundation models in heterogeneous environments B Yuan, Y He, J Davis, T Zhang, T Dao, B Chen, PS Liang, C Re, C Zhang Advances in Neural Information Processing Systems 35, 25464-25477, 2022 | 62 | 2022 |
Gaussian quadrature for kernel features T Dao, CM De Sa, C Ré Advances in neural information processing systems 30, 2017 | 60 | 2017 |
Starcoder 2 and the stack v2: The next generation A Lozhkov, R Li, LB Allal, F Cassano, J Lamy-Poirier, N Tazi, A Tang, ... arXiv preprint arXiv:2402.19173, 2024 | 53 | 2024 |
Learning compressed transforms with low displacement rank A Thomas, A Gu, T Dao, A Rudra, C Ré Advances in neural information processing systems 31, 2018 | 52 | 2018 |