Reconciling high accuracy, cost-efficiency, and low latency of inference serving systems

RF Hussain, MA Salehi - Future Generation Computer Systems, 2024 - Elsevier

The Industry 4.0 revolution has been made possible via AI-based applications (eg, for
automation and maintenance) deployed on the serverless edge (aka fog) computing …

被引用次数：8 相关文章所有 3 个版本

[PDF] github.io

Is Machine Learning Necessary for Cloud Resource Usage Forecasting?

G Christofidi, K Papaioannou, TD Doudali - Proceedings of the 2023 …, 2023 - dl.acm.org

Robust forecasts of future resource usage in cloud computing environments enable high
efficiency in resource management solutions, such as autoscaling and overcommitment …

被引用次数：6 相关文章所有 6 个版本

[HTML] sciencedirect.com

[HTML][HTML] Challenges and opportunities of using transformer-based multi-task learning in NLP through ML lifecycle: A position paper

L Torbarina, T Ferkovic, L Roguski, V Mihelcic… - Natural Language …, 2024 - Elsevier

The increasing adoption of natural language processing (NLP) models across industries has
led to practitioners' need for machine learning (ML) systems to handle these models …

被引用次数：1 相关文章

[PDF] arxiv.org

IPA: Inference Pipeline Adaptation to Achieve High Accuracy and Cost-Efficiency

S Ghafouri, K Razavi, M Salmani, A Sanaee… - arXiv preprint arXiv …, 2023 - arxiv.org

Efficiently optimizing multi-model inference pipelines for fast, accurate, and cost-effective
inference is a crucial challenge in ML production systems, given their tight end-to-end …

被引用次数：3 相关文章所有 6 个版本

[PDF] acm.org

CAMEO: A Causal Transfer Learning Approach for Performance Optimization of Configurable Computer Systems

MS Iqbal, Z Zhong, I Ahmad, B Ray… - Proceedings of the 2023 …, 2023 - dl.acm.org

Modern computer systems are highly configurable, with hundreds of configuration options
that interact, resulting in an enormous configuration space. As a result, optimizing …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Sponge: Inference Serving with Dynamic SLOs Using In-Place Vertical Scaling

K Razavi, S Ghafouri, M Mühlhäuser… - Proceedings of the 4th …, 2024 - dl.acm.org

Mobile and IoT applications increasingly adopt deep learning inference to provide
intelligence. Inference requests are typically sent to a cloud infrastructure over a wireless …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

A Tale of Two Scales: Reconciling Horizontal and Vertical Scaling for Inference Serving Systems

K Razavi, M Salmani, M Mühlhäuser… - arXiv preprint arXiv …, 2024 - arxiv.org

Inference serving is of great importance in deploying machine learning models in real-world
applications, ensuring efficient processing and quick responses to inference requests …

Smart-Kube: Energy-Aware and Fair Kubernetes Job Scheduler Using Deep Reinforcement Learning

S Ghafouri, S Abdipoor, J Doyle - 2023 IEEE 8th International …, 2023 - ieeexplore.ieee.org

One of the most challenging problems in the popular orchestration framework Kubernetes is
assigning sufficient resources to containers to operate at a required level while also …

被引用次数：1 相关文章所有 3 个版本

[PDF] qmul.ac.uk

Machine Learning in Container Orchestration Systems: Applications and Deployment

S Ghafouri - 2024 - qmro.qmul.ac.uk

In recent years, machine learning methods, particularly Reinforcement Learning, have
become increasingly popular for addressing resource management challenges, notably in …

[PDF] hpcclab.org

[PDF][PDF] Resource Allocation of Industry 4.0 Micro-Service Applications across Serverless Fog Federation

RF Hussaina, MA Salehib - hpcclab.org

The Industry 4.0 revolution has been made possible via AI-based applications (eg, for
automation and maintenance) deployed on the serverless edge (aka fog) computing …