Grnn: Low-latency and scalable rnn inference on gpus C Holmes, D Mawhirter, Y He, F Yan, B Wu Proceedings of the Fourteenth EuroSys Conference 2019, 1-16, 2019 | 58 | 2019 |
Deepspeed-chat: Easy, fast and affordable rlhf training of chatgpt-like models at all scales Z Yao, RY Aminabadi, O Ruwase, S Rajbhandari, X Wu, AA Awan, ... arXiv preprint arXiv:2308.01320, 2023 | 34 | 2023 |
Graphzero: A high-performance subgraph matching system D Mawhirter, S Reinehr, C Holmes, T Liu, B Wu ACM SIGOPS Operating Systems Review 55 (1), 21-37, 2021 | 32 | 2021 |
Graphzero: Breaking symmetry for efficient graph mining D Mawhirter, S Reinehr, C Holmes, T Liu, B Wu arXiv preprint arXiv:1911.12877, 2019 | 29 | 2019 |
Dryadic: Flexible and fast graph pattern matching at scale D Mawhirter, S Reinehr, W Han, N Fields, M Claver, C Holmes, J McClurg, ... 2021 30th International Conference on Parallel Architectures and Compilation …, 2021 | 16 | 2021 |
Zero++: Extremely efficient collective communication for giant model training G Wang, H Qin, SA Jacobs, C Holmes, S Rajbhandari, O Ruwase, F Yan, ... arXiv preprint arXiv:2306.10209, 2023 | 14 | 2023 |
Safe and smooth: Certified continuous-time range-only localization F Dümbgen, C Holmes, TD Barfoot IEEE Robotics and Automation Letters 8 (2), 1117-1124, 2022 | 14 | 2022 |
NxMTransformer: semi-structured sparsification for natural language understanding via ADMM C Holmes, M Zhang, Y He, B Wu Advances in neural information processing systems 34, 1818-1830, 2021 | 14 | 2021 |
Renaissance: A survey into ai text-to-image generation in the era of large model F Bie, Y Yang, Z Zhou, A Ghanem, M Zhang, Z Yao, X Wu, C Holmes, ... arXiv preprint arXiv:2309.00810, 2023 | 12 | 2023 |
Towards open world nerf-based slam D Lisus, C Holmes, S Waslander 2023 20th Conference on Robots and Vision (CRV), 37-44, 2023 | 12 | 2023 |
An efficient global optimality certificate for landmark-based SLAM C Holmes, TD Barfoot IEEE Robotics and Automation Letters 8 (3), 1539-1546, 2023 | 11 | 2023 |
Random-ltd: Random and layerwise token dropping brings efficient training for large-scale transformers Z Yao, X Wu, C Li, C Holmes, M Zhang, C Li, Y He arXiv preprint arXiv:2211.11586, 2022 | 11 | 2022 |
Toward globally optimal state estimation using automatically tightened semidefinite relaxations F Dümbgen, C Holmes, B Agro, TD Barfoot arXiv preprint arXiv:2308.05783, 2023 | 9 | 2023 |
Deepspeed data efficiency: Improving deep learning model quality and training efficiency via efficient data sampling and routing C Li, Z Yao, X Wu, M Zhang, C Holmes, C Li, Y He Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 18490 …, 2024 | 6 | 2024 |
On Semidefinite Relaxations for Matrix-Weighted State-Estimation Problems in Robotics C Holmes, F Dümbgen, TD Barfoot arXiv preprint arXiv:2308.07275, 2023 | 6 | 2023 |
Deepspeed-fastgen: High-throughput text generation for llms via mii and deepspeed-inference C Holmes, M Tanaka, M Wyatt, AA Awan, J Rasley, S Rajbhandari, ... arXiv preprint arXiv:2401.08671, 2024 | 5 | 2024 |
Certifiably Optimal Rotation and Pose Estimation Based on the Cayley Map TD Barfoot, C Holmes, F Dümbgen arXiv preprint arXiv:2308.12418, 2023 | 4 | 2023 |
DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies SL Song, B Kruft, M Zhang, C Li, S Chen, C Zhang, M Tanaka, X Wu, ... arXiv preprint arXiv:2310.04610, 2023 | 2 | 2023 |
A fine line: Total least-squares line fitting as QCQP optimization TD Barfoot, C Holmes, F Dumbgen arXiv preprint arXiv:2206.05082, 2022 | 2 | 2022 |
Exploiting Chordal Sparsity for Fast Global Optimality with Application to Localization F Dümbgen, C Holmes, TD Barfoot arXiv preprint arXiv:2406.02365, 2024 | 1 | 2024 |