{AlpaServe}: Statistical multiplexing with model parallelism for deep learning serving

Z Li, L Zheng, Y Zhong, V Liu, Y Sheng, X Jin… - … USENIX Symposium on …, 2023 - usenix.org
Model parallelism is conventionally viewed as a method to scale a single large deep
learning model beyond the memory limits of a single device. In this paper, we demonstrate …

Optimizing the cloud? Don't train models. Build oracles!

T Bang, C Power, S Ameli, N Crooks… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose cloud oracles, an alternative to machine learning for online optimization of
cloud configurations. Our cloud oracle approach guarantees complete accuracy and …

Zeal: Rethinking Large-Scale Resource Allocation with" Decouple and Decompose"

Z Xu, FY Yan, M Yu - arXiv preprint arXiv:2412.11447, 2024 - arxiv.org
Resource allocation is fundamental for cloud systems to ensure efficient resource sharing
among tenants. However, the scale of such optimization problems has outgrown the …

SkyPIE: A Fast & Accurate Oracle for Object Placement

T Bang, C Douglas, N Crooks… - Proceedings of the ACM on …, 2024 - dl.acm.org
Cloud object stores offer vastly different price points for object storage as a function of
workload and geography. Poor object placement can thus lead to significant cost overheads …

Demystifying Data Management for Large Language Models

X Miao, Z Jia, B Cui - Companion of the 2024 International Conference …, 2024 - dl.acm.org
Navigating the intricacies of data management in the era of Large Language Models (LLMs)
presents both challenges and opportunities for database and data management …

Модели смешанного целочисленного линейного программирования оптимизации включения заданий в пакеты и порядков проведения операций с ними в …

КВ Кротов - Информационно-управляющие системы, 2024 - i-us.ru
Аннотация Введение: идентификация эффективных расписаний выполнения в
конвейерных системах наборов однотипных заданий предусматривает поиск лучших …

Empowering Large Language Models with Efficient and Automated Systems

Z Li - 2024 - search.proquest.com
Abstract Large Language Models (LLMs) have shown remarkable capabilities in a variety of
tasks, including chatting, programming, and searching. However, the high costs of LLMs are …

Timely and Efficient Resource Management in Networked Systems

Z Xu - 2024 - search.proquest.com
Resource management is ubiquitous in networked systems and ensures the effective
sharing of resources among various demands. Examples include traffic engineering, cluster …

[图书][B] Abstractions for Scaling Stateful Cloud Applications

P Kraft - 2023 - search.proquest.com
As the scale of both computing and data grows, developers are increasingly building
distributed statefulsystems in the cloud. However, these systems are challenging to build at …