Machine learning methods for reliable resource provisioning in edge-cloud computing: A survey
Large-scale software systems are currently designed as distributed entities and deployed in
cloud data centers. To overcome the limitations inherent to this type of deployment …
cloud data centers. To overcome the limitations inherent to this type of deployment …
Large models for time series and spatio-temporal data: A survey and outlook
Temporal data, notably time series and spatio-temporal data, are prevalent in real-world
applications. They capture dynamic system measurements and are produced in vast …
applications. They capture dynamic system measurements and are produced in vast …
Who limits the resource efficiency of my datacenter: An analysis of alibaba datacenter traces
Cloud platform provides great flexibility and cost-efficiency for end-users and cloud
operators. However, low resource utilization in modern datacenters brings huge wastes of …
operators. However, low resource utilization in modern datacenters brings huge wastes of …
Protean:{VM} allocation service at scale
We describe the design and implementation of Protean--the Microsoft Azure service
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …
responsible for allocating Virtual Machines (VMs) to millions of servers around the globe. A …
Nu: Achieving {Microsecond-Scale} resource fungibility with logical processes
Datacenters waste significant compute and memory resources today because they lack
resource fungibility: the ability to reassign resources quickly and without disruption. We …
resource fungibility: the ability to reassign resources quickly and without disruption. We …
Characteristics of co-allocated online services and batch jobs in internet data centers: a case study from Alibaba cloud
In order to reduce power and energy costs, giant cloud providers now mix online and batch
jobs on the same cluster. Although the co-allocation of such jobs improves machine …
jobs on the same cluster. Although the co-allocation of such jobs improves machine …
[HTML][HTML] Workflow performance prediction based on graph structure aware deep attention neural network
With the rapid growth of cloud computing, efficient operational optimization and resource
scheduling of complex cloud business processes rely on real-time and accurate …
scheduling of complex cloud business processes rely on real-time and accurate …
Disaggregated data centers: Challenges and trade-offs
Disaggregated data centers (DCs) can offer high flexibility for resource allocation; hence,
their resource utilization can be significantly improved. However, communications between …
their resource utilization can be significantly improved. However, communications between …
Metis: Learning to schedule long-running applications in shared container clusters at scale
Online cloud services are increasingly deployed as long-running applications (LRAs) in
containers. Placing LRA containers is known to be difficult as they often have sophisticated …
containers. Placing LRA containers is known to be difficult as they often have sophisticated …
In search of a fast and efficient serverless dag engine
Python-written data analytics applications can be modeled as and compiled into a directed
acyclic graph (DAG) based workflow, where the nodes are fine-grained tasks and the edges …
acyclic graph (DAG) based workflow, where the nodes are fine-grained tasks and the edges …