Online speculative decoding X Liu, L Hu, P Bailis, A Cheung, Z Deng, I Stoica, H Zhang arXiv preprint arXiv:2310.07177, 2023 | 39 | 2023 |
Order-preserving key compression for in-memory search trees H Zhang, X Liu, DG Andersen, M Kaminsky, K Keeton, A Pavlo Proceedings of the 2020 ACM SIGMOD International Conference on Management of …, 2020 | 30 | 2020 |
Gact: Activation compressed training for generic network architectures X Liu, L Zheng, D Wang, Y Cen, W Chen, X Han, J Chen, Z Liu, J Tang, ... International Conference on Machine Learning, 14139-14152, 2022 | 25 | 2022 |
Leveraging application data constraints to optimize database-backed web applications X Liu, S Wang, M Sun, S Pan, G Li, S Jha, C Yan, J Yang, S Lu, A Cheung arXiv preprint arXiv:2205.02954, 2022 | 6 | 2022 |
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native Y Lu, S Bian, L Chen, Y He, Y Hui, M Lentz, B Li, F Liu, J Li, Q Liu, R Liu, ... arXiv preprint arXiv:2401.12230, 2024 | 5 | 2024 |
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources Z Li, X Liu, B Zhu, Z Dong, Q Gu, K Keutzer arXiv preprint arXiv:2310.07147, 2023 | 4 | 2023 |
M\'elange: Cost Efficient Large Language Model Serving by Exploiting GPU Heterogeneity T Griggs, X Liu, J Yu, D Kim, WL Chiang, A Cheung, I Stoica arXiv preprint arXiv:2404.14527, 2024 | 3 | 2024 |
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput X Liu, C Daniel, L Hu, W Kwon, Z Li, X Mo, A Cheung, Z Deng, I Stoica, ... arXiv preprint arXiv:2406.14066, 2024 | 1 | 2024 |
Learned Best-Effort LLM Serving S Jha, C Hooper, X Liu, S Kim, K Keutzer arXiv preprint arXiv:2401.07886, 2024 | 1 | 2024 |
Towards Auto-Generated Data Systems A Cheung, MBS Ahmad, B Haynes, C Kittivorawong, S Laddad, X Liu, ... Proceedings of the VLDB Endowment 16 (12), 4116-4129, 2023 | | 2023 |