Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …
services and products, constituting significant compute demand of cloud infrastructure. Thus …
Tailbench: a benchmark suite and evaluation methodology for latency-critical applications
Latency-critical applications, common in datacenters, must achieve small and predictable
tail (eg, 95th or 99th percentile) latencies. Their strict performance requirements limit …
tail (eg, 95th or 99th percentile) latencies. Their strict performance requirements limit …
Rubik: Fast analytical power management for latency-critical systems
Latency-critical workloads (eg, web search), common in datacenters, require stable tail (eg,
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …
Twig: Multi-agent task management for colocated latency-critical cloud services
Many of the important services running on data centres are latency-critical, time-varying, and
demand strict user satisfaction. Stringent tail-latency targets for colocated services and …
demand strict user satisfaction. Stringent tail-latency targets for colocated services and …
Metrics for sustainable data centers
There are a multitude of metrics available to analyze individual key performance indicators
of data centers. In order to predict growth or set effective goals, it is important to choose the …
of data centers. In order to predict growth or set effective goals, it is important to choose the …
Octopus-man: Qos-driven task management for heterogeneous multicores in warehouse-scale computers
V Petrucci, MA Laurenzano, J Doherty… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Heterogeneous multicore architectures have the potential to improve energy efficiency by
integrating power-efficient wimpy cores with high-performing brawny cores. However, it is an …
integrating power-efficient wimpy cores with high-performing brawny cores. However, it is an …
Hipster: Hybrid task manager for latency-critical cloud workloads
In 2013, US data centers accounted for 2.2% of the country's total electricity consumption, a
figure that is projected to increase rapidly over the next decade. Many important workloads …
figure that is projected to increase rapidly over the next decade. Many important workloads …
μdpm: Dynamic power management for the microsecond era
The complex, distributed nature of data centers have spawned the adoption of distributed,
multi-tiered software architectures, consisting of many inter-connected microservices. These …
multi-tiered software architectures, consisting of many inter-connected microservices. These …
Breaking the boundaries in heterogeneous-ISA datacenters
A Barbalace, R Lyerly, C Jelesnianski… - ACM SIGARCH …, 2017 - dl.acm.org
Energy efficiency is one of the most important design considerations in running modern
datacenters. Datacenter operating systems rely on software techniques such as execution …
datacenters. Datacenter operating systems rely on software techniques such as execution …
KRISP: Enabling kernel-wise right-sizing for spatial partitioned gpu inference servers
Machine learning (ML) inference workloads present significantly different challenges than
ML training workloads. Typically, inference workloads are shorter running and under-utilize …
ML training workloads. Typically, inference workloads are shorter running and under-utilize …