Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

Tailbench: a benchmark suite and evaluation methodology for latency-critical applications

H Kasture, D Sanchez - 2016 IEEE International Symposium on …, 2016 - ieeexplore.ieee.org
Latency-critical applications, common in datacenters, must achieve small and predictable
tail (eg, 95th or 99th percentile) latencies. Their strict performance requirements limit …

Rubik: Fast analytical power management for latency-critical systems

H Kasture, DB Bartolini, N Beckmann… - Proceedings of the 48th …, 2015 - dl.acm.org
Latency-critical workloads (eg, web search), common in datacenters, require stable tail (eg,
95 th percentile) latencies of a few milliseconds. Servers running these workloads are kept …

Twig: Multi-agent task management for colocated latency-critical cloud services

R Nishtala, V Petrucci, P Carpenter… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Many of the important services running on data centres are latency-critical, time-varying, and
demand strict user satisfaction. Stringent tail-latency targets for colocated services and …

Metrics for sustainable data centers

VD Reddy, B Setz, GSVRK Rao… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
There are a multitude of metrics available to analyze individual key performance indicators
of data centers. In order to predict growth or set effective goals, it is important to choose the …

Octopus-man: Qos-driven task management for heterogeneous multicores in warehouse-scale computers

V Petrucci, MA Laurenzano, J Doherty… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Heterogeneous multicore architectures have the potential to improve energy efficiency by
integrating power-efficient wimpy cores with high-performing brawny cores. However, it is an …

Hipster: Hybrid task manager for latency-critical cloud workloads

R Nishtala, P Carpenter, V Petrucci… - … Symposium on High …, 2017 - ieeexplore.ieee.org
In 2013, US data centers accounted for 2.2% of the country's total electricity consumption, a
figure that is projected to increase rapidly over the next decade. Many important workloads …

μdpm: Dynamic power management for the microsecond era

CH Chou, LN Bhuyan, D Wong - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
The complex, distributed nature of data centers have spawned the adoption of distributed,
multi-tiered software architectures, consisting of many inter-connected microservices. These …

Breaking the boundaries in heterogeneous-ISA datacenters

A Barbalace, R Lyerly, C Jelesnianski… - ACM SIGARCH …, 2017 - dl.acm.org
Energy efficiency is one of the most important design considerations in running modern
datacenters. Datacenter operating systems rely on software techniques such as execution …

KRISP: Enabling kernel-wise right-sizing for spatial partitioned gpu inference servers

M Chow, A Jahanshahi, D Wong - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Machine learning (ML) inference workloads present significantly different challenges than
ML training workloads. Typically, inference workloads are shorter running and under-utilize …