Few-to-many: Incremental parallelism for reducing tail latency in interactive services
Interactive services, such as Web search, recommendations, games, and finance, must
respond quickly to satisfy customers. Achieving this goal requires optimizing tail (eg, 99th+ …
respond quickly to satisfy customers. Achieving this goal requires optimizing tail (eg, 99th+ …
Adaptive, efficient, parallel execution of parallel programs
S Sridharan, G Gupta, GS Sohi - Proceedings of the 35th ACM SIGPLAN …, 2014 - dl.acm.org
Future multicore processors will be heterogeneous, be increasingly less reliable, and
operate in dynamically changing operating conditions. Such environments will result in a …
operate in dynamically changing operating conditions. Such environments will result in a …
Providing high‐level self‐adaptive abstractions for stream parallelism on multicores
A Vogel, D Griebler… - Software: practice and …, 2021 - Wiley Online Library
Stream processing applications are common computing workloads that demand parallelism
to increase their performance. As in the past, parallel programming remains a difficult task …
to increase their performance. As in the past, parallel programming remains a difficult task …
Work stealing for interactive services to meet target latency
Interactive web services increasingly drive critical business workloads such as search,
advertising, games, shopping, and finance. Whereas optimizing parallel programs and …
advertising, games, shopping, and finance. Whereas optimizing parallel programs and …
A portable, automatic data qantizer for deep neural networks
With the proliferation of AI-based applications and services, there are strong demands for
efficient processing of deep neural networks (DNNs). DNNs are known to be both compute …
efficient processing of deep neural networks (DNNs). DNNs are known to be both compute …
Smart, adaptive mapping of parallelism in the presence of external workload
MK Emani, Z Wang, MFP O'Boyle - Proceedings of the 2013 …, 2013 - ieeexplore.ieee.org
Given the wide scale adoption of multi-cores in main stream computing, parallel programs
rarely execute in isolation and have to share the platform with other applications that …
rarely execute in isolation and have to share the platform with other applications that …
Swift machine learning model serving scheduling: a region based reinforcement learning approach
The success of machine learning has prospered Machine-Learning-as-a-Service (MLaaS)-
deploying trained machine learning (ML) models in cloud to provide low latency inference …
deploying trained machine learning (ML) models in cloud to provide low latency inference …
Adaptive parallelism for web search
A web search query made to Microsoft Bing is currently parallelized by distributing the query
processing across many servers. Within each of these servers, the query is, however …
processing across many servers. Within each of these servers, the query is, however …
Holistic run-time parallelism management for time and energy efficiency
S Sridharan, G Gupta, GS Sohi - Proceedings of the 27th international …, 2013 - dl.acm.org
The ubiquity of parallel machines will necessitate time-and energy-efficient parallel
execution of a program in a wide range of hardware and software environments. Prevalent …
execution of a program in a wide range of hardware and software environments. Prevalent …
Parcae: a system for flexible parallel execution
Workload, platform, and available resources constitute a parallel program's execution
environment. Most parallelization efforts statically target an anticipated range of …
environment. Most parallelization efforts statically target an anticipated range of …