FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU Y Sheng, L Zheng, B Yuan, Z Li, M Ryabinin, B Chen, P Liang, C Re, ... International Conference on Machine Learning, 2023 | 167 | 2023 |
Efficient streaming language models with attention sinks G Xiao, Y Tian, B Chen, S Han, M Lewis arXiv preprint arXiv:2309.17453, 2023 | 148 | 2023 |
SLIDE: In Defense of Smart Algorithms over Hardware Acceleration for Large-scale Deep Learning Systems B Chen, T Medini, J Farwell, S Gobriel, C Tai, A Shrivastava Proceedings of Machine Learning and System 2, 291--306, 2020 | 127 | 2020 |
Deja vu: Contextual sparsity for efficient llms at inference time Z Liu, J Wang, T Dao, T Zhou, B Yuan, Z Song, A Shrivastava, C Zhang, ... International Conference on Machine Learning, 22137-22176, 2023 | 112 | 2023 |
Scatterbrain: Unifying sparse and low-rank attention B Chen, T Dao, E Winsor, Z Song, A Rudra, C Ré Advances in Neural Information Processing Systems 34, 17413-17426, 2021 | 93 | 2021 |
H O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Z Zhang, Y Sheng, T Zhou, T Chen, L Zheng, R Cai, Z Song, Y Tian, C Ré, ... International Conference on Machine Learning, 2023 | 82 | 2023 |
MONGOOSE: A learnable LSH framework for efficient neural network training B Chen, Z Liu, B Peng, Z Xu, JL Li, T Dao, Z Song, A Shrivastava, C Re International Conference on Learning Representations, 2021 | 70 | 2021 |
Monarch: Expressive structured matrices for efficient and accurate training T Dao, B Chen, NS Sohoni, A Desai, M Poli, J Grogan, A Liu, A Rao, ... International Conference on Machine Learning, 4690-4721, 2022 | 69 | 2022 |
Analyzing log analysis: An empirical study of user log mining S Alspaugh, B Chen, J Lin, A Ganapathi, M Hearst, R Katz 28th Large Installation System Administration Conference (LISA14), 62-77, 2014 | 67 | 2014 |
Pixelated butterfly: Simple and efficient sparse training for neural network models B Chen, T Dao, K Liang, J Yang, Z Song, A Rudra, C Re International Conference on Learning Representations, 2022 | 65* | 2022 |
Decentralized training of foundation models in heterogeneous environments B Yuan, Y He, JQ Davis, T Zhang, T Dao, B Chen, P Liang, C Re, C Zhang Neural Information Processing Systems., 2022 | 58 | 2022 |
Angular visual hardness B Chen, W Liu, Z Yu, J Kautz, A Shrivastava, A Garg, A Anandkumar International Conference on Machine Learning, 1637-1648, 2020 | 50 | 2020 |
Fast and accurate stochastic gradient estimation B Chen, Y Xu, A Shrivastava Advances in Neural Information Processing Systems 32, 2019 | 48* | 2019 |
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer Y Tian, Y Wang, B Chen, S Du International Conference on Machine Learning, 2023 | 44 | 2023 |
Unique entity estimation with application to the Syrian conflict B Chen, A Shrivastava, RC Steorts The Annals of Applied Statistics 12 (2), 1039-1067, 2018 | 36 | 2018 |
Sub-linear privacy-preserving near-neighbor search MS Riazi, B Chen, A Shrivastava, D Wallach, F Koushanfar arXiv preprint arXiv:1612.01835, 2016 | 25* | 2016 |
Densified winner take all (WTA) hashing for sparse datasets B Chen, A Shrivastava Uncertainty in artificial intelligence, 2018 | 23* | 2018 |
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention Y Tian, Y Wang, Z Zhang, B Chen, S Du arXiv preprint arXiv:2310.00535, 2023 | 20 | 2023 |
Cocktailsgd: Fine-tuning foundation models over 500mbps networks J Wang, Y Lu, B Yuan, B Chen, P Liang, C De Sa, C Re, C Zhang International Conference on Machine Learning, 36058-36076, 2023 | 19 | 2023 |
Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt Z Xu, Z Liu, B Chen, Y Tang, J Wang, K Zhou, X Hu, A Shrivastava arXiv preprint arXiv:2305.11186, 2023 | 17 | 2023 |