{SHEPHERD}: Serving {DNNs} in the wild
… SHEPHERD, a model serving system that achieves all three goals in the face of workload
unpredictability. SHEPHERD … We evaluate SHEPHERD with 17 DNN models widely used for …
unpredictability. SHEPHERD … We evaluate SHEPHERD with 17 DNN models widely used for …
Serving {DNNs} like clockwork: Performance predictability from the bottom up
… We thank our shepherd Junfeng Yang and the anonymous reviewers for their insightful
feedback that helped improve our work. Our work was partially supported by NSF CAREER Grant …
feedback that helped improve our work. Our work was partially supported by NSF CAREER Grant …
Towards Optimal Preemptive GPU Time-Sharing for Edge Model Serving
… PipeSwitch [8] and Shepherd [30] insert synchronization points at DNN model layer level to
… We implement Shepherd [30] by inserting the synchronization points between DNN layers …
… We implement Shepherd [30] by inserting the synchronization points between DNN layers …
{dLoRA}: Dynamically Orchestrating Requests and Adapters for {LoRA}{LLM} Serving
… serve each LoRA model and adopt an existing model serving orchestrators like SHEPHERD
… Different from the previous DNN serving scenarios, we find that the peak capacity of a LoRA …
… Different from the previous DNN serving scenarios, we find that the peak capacity of a LoRA …
{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving
… serve the request under SLO and rejects the request if it cannot. This is possible because the
execution time of a DNN … We thank the OSDI reviewers and our shepherd, Heming Cui, for …
execution time of a DNN … We thank the OSDI reviewers and our shepherd, Heming Cui, for …
[HTML][HTML] Exploring practical vulnerabilities of machine learning-based wireless systems
… of DNN training by 15.3%–75.8% for diverse workloads. … present SHEPHERD, a model
serving system that achieves all three goals in the face of workload unpredictability. SHEPHERD …
serving system that achieves all three goals in the face of workload unpredictability. SHEPHERD …
ServerlessLLM: Low-latency serverless inference for large language models
… LLM checkpoints are significantly larger than conventional DNN checkpoints, which leads to
… We also observe that with Shepherd*, model checkpoints are read from SSD 2X times more …
… We also observe that with Shepherd*, model checkpoints are read from SSD 2X times more …
Fast distributed inference serving for large language models
… other deep neural network (DNN) model inference like ResNet [7]. DNN inference jobs are
… Existing inference serving solutions like Clockwork [8] and Shepherd [9] target deterministic …
… Existing inference serving solutions like Clockwork [8] and Shepherd [9] target deterministic …
{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference
… We first used Shepherd to decide the BS (bs … We sincerely thank the anonymous reviewers
of OSDI and our shepherd for their invaluable feedback. We are grateful to Kevin Skadron for …
of OSDI and our shepherd for their invaluable feedback. We are grateful to Kevin Skadron for …
AdaInf: Data Drift Adaptive Scheduling for Accurate and SLO-guaranteed Multiple-Model Inference Serving at Edge Servers
… on multiple deep neural network (DNN) models deployed on … and SLOguaranteed Inference
serving at edge servers (… reviewers and our shepherd, Ganesh Ananthanarayanan, for …
serving at edge servers (… reviewers and our shepherd, Ganesh Ananthanarayanan, for …