{SHEPHERD}: Serving {DNNs} in the wild

H Zhang, Y Tang, A Khandelwal, I Stoica - 20th USENIX Symposium on …, 2023 - usenix.org
SHEPHERD, a model serving system that achieves all three goals in the face of workload
unpredictability. SHEPHERD … We evaluate SHEPHERD with 17 DNN models widely used for …

Serving {DNNs} like clockwork: Performance predictability from the bottom up

A Gujarati, R Karimi, S Alzayat, W Hao… - … USENIX Symposium on …, 2020 - usenix.org
… We thank our shepherd Junfeng Yang and the anonymous reviewers for their insightful
feedback that helped improve our work. Our work was partially supported by NSF CAREER Grant …

Towards Optimal Preemptive GPU Time-Sharing for Edge Model Serving

Z Xia, Y Hao, J Duan, C Wang, J Jiang - Proceedings of the 9th …, 2023 - dl.acm.org
… PipeSwitch [8] and Shepherd [30] insert synchronization points at DNN model layer level to
… We implement Shepherd [30] by inserting the synchronization points between DNN layers …

{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving

B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin - 18th USENIX Symposium on …, 2024 - usenix.org
serve each LoRA model and adopt an existing model serving orchestrators like SHEPHERD
… Different from the previous DNN serving scenarios, we find that the peak capacity of a LoRA …

{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin… - … USENIX Symposium on …, 2023 - usenix.org
serve the request under SLO and rejects the request if it cannot. This is possible because the
execution time of a DNN … We thank the OSDI reviewers and our shepherd, Heming Cui, for …

[HTML][HTML] Exploring practical vulnerabilities of machine learning-based wireless systems

Z Liu, C Xu, Y Xie, E Sie, F Yang, K Karwaski… - … USENIX Symposium on …, 2023 - usenix.org
… of DNN training by 15.3%–75.8% for diverse workloads. … present SHEPHERD, a model
serving system that achieves all three goals in the face of workload unpredictability. SHEPHERD

ServerlessLLM: Low-latency serverless inference for large language models

Y Fu, L Xue, Y Huang, AO Brabete… - … Systems Design and …, 2024 - research.ed.ac.uk
… LLM checkpoints are significantly larger than conventional DNN checkpoints, which leads to
… We also observe that with Shepherd*, model checkpoints are read from SSD 2X times more …

Fast distributed inference serving for large language models

B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
… other deep neural network (DNN) model inference like ResNet [7]. DNN inference jobs are
… Existing inference serving solutions like Clockwork [8] and Shepherd [9] target deterministic …

{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference

SS Shubha, H Shen, A Iyer - 18th USENIX Symposium on Operating …, 2024 - usenix.org
… We first used Shepherd to decide the BS (bs … We sincerely thank the anonymous reviewers
of OSDI and our shepherd for their invaluable feedback. We are grateful to Kevin Skadron for …

AdaInf: Data Drift Adaptive Scheduling for Accurate and SLO-guaranteed Multiple-Model Inference Serving at Edge Servers

SS Shubha, H Shen - Proceedings of the ACM SIGCOMM 2023 …, 2023 - dl.acm.org
… on multiple deep neural network (DNN) models deployed on … and SLOguaranteed Inference
serving at edge servers (… reviewers and our shepherd, Ganesh Ananthanarayanan, for …