Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey

TL Duc, RG Leiva, P Casari, PO Östberg - ACM Computing Surveys …, 2019 - dl.acm.org
Large-scale software systems are currently designed as distributed entities and deployed in
cloud data centers. To overcome the limitations inherent to this type of deployment …

Large models for time series and spatio-temporal data: A survey and outlook

M Jin, Q Wen, Y Liang, C Zhang, S Xue, X Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Temporal data, notably time series and spatio-temporal data, are prevalent in real-world
applications. They capture dynamic system measurements and are produced in vast …

Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces

J Guo, Z Chang, S Wang, H Ding, Y Feng… - Proceedings of the …, 2019 - dl.acm.org
Cloud platform provides great flexibility and cost-efficiency for end-users and cloud
operators. However, low resource utilization in modern datacenters brings huge wastes of …

Protean:{VM} allocation service at scale

O Hadary, L Marshall, I Menache, A Pan… - … USENIX Symposium on …, 2020 - usenix.org
We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …

Nu: Achieving {Microsecond-Scale} resource fungibility with logical processes

Z Ruan, SJ Park, MK Aguilera, A Belay… - … USENIX Symposium on …, 2023 - usenix.org
Datacenters waste significant compute and memory resources today because they lack
resource fungibility: the ability to reassign resources quickly and without disruption. We …

Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud

C Jiang, G Han, J Lin, G Jia, W Shi, J Wan - IEEE Access, 2019 - ieeexplore.ieee.org
In order to reduce power and energy costs, giant cloud providers now mix online and batch
jobs on the same cluster. Although the co-allocation of such jobs improves machine …

[HTML][HTML] Workflow performance prediction based on graph structure aware deep attention neural network

J Yu, M Gao, Y Li, Z Zhang, WH Ip, KL Yung - Journal of Industrial …, 2022 - Elsevier
With the rapid growth of cloud computing, efficient operational optimization and resource
scheduling of complex cloud business processes rely on real-time and accurate …

Disaggregated data centers: Challenges and trade-offs

R Lin, Y Cheng, M De Andrade… - IEEE …, 2020 - ieeexplore.ieee.org
Disaggregated data centers (DCs) can offer high flexibility for resource allocation; hence,
their resource utilization can be significantly improved. However, communications between …

Metis: Learning to schedule long-running applications in shared container clusters at scale

L Wang, Q Weng, W Wang, C Chen… - … Conference for High …, 2020 - ieeexplore.ieee.org
Online cloud services are increasingly deployed as long-running applications (LRAs) in
containers. Placing LRA containers is known to be difficult as they often have sophisticated …

In search of a fast and efficient serverless dag engine

B Carver, J Zhang, A Wang… - 2019 IEEE/ACM Fourth …, 2019 - ieeexplore.ieee.org
Python-written data analytics applications can be modeled as and compiled into a directed
acyclic graph (DAG) based workflow, where the nodes are fine-grained tasks and the edges …