Splitwise: Efficient generative llm inference using phase splitting
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
A survey of techniques for architecting and managing asymmetric multicore processors
S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
To meet the needs of a diverse range of workloads, asymmetric multicore processors
(AMPs) have been proposed, which feature cores of different microarchitecture or ISAs …
(AMPs) have been proposed, which feature cores of different microarchitecture or ISAs …
Paragon: QoS-aware scheduling for heterogeneous datacenters
C Delimitrou, C Kozyrakis - ACM SIGPLAN Notices, 2013 - dl.acm.org
Large-scale datacenters (DCs) host tens of thousands of diverse applications each day.
However, interference between colocated workloads and the difficulty to match applications …
However, interference between colocated workloads and the difficulty to match applications …
[PDF][PDF] Research problems and opportunities in memory systems
O Mutlu, L Subramanian - Supercomputing frontiers and …, 2014 - superfri.susu.ru
The memory system is a fundamental performance and energy bottleneck in almost all
computing systems. Recent system design, application, and technology trends that require …
computing systems. Recent system design, application, and technology trends that require …
Hierarchical power management for asymmetric multi-core in dark silicon era
TS Muthukaruppan, M Pricopi… - Proceedings of the 50th …, 2013 - dl.acm.org
Asymmetric multi-core architectures integrating cores with diverse power-performance
characteristics is emerging as a promising alternative in the dark silicon era where only a …
characteristics is emerging as a promising alternative in the dark silicon era where only a …
MISE: Providing performance predictability and improving fairness in shared main memory systems
Applications running concurrently on a multicore system interfere with each other at the main
memory. This interference can slow down different applications differently. Accurately …
memory. This interference can slow down different applications differently. Accurately …
High-performance and energy-efficient mobile web browsing on big/little systems
Internet web browsing has reached a critical tipping point. Increasingly, users rely more on
mobile web browsers to access the Internet than desktop browsers. Meanwhile, webpages …
mobile web browsers to access the Internet than desktop browsers. Meanwhile, webpages …
Composite cores: Pushing heterogeneity into a core
Heterogeneous multicore systems--comprised of multiple cores with varying capabilities,
performance, and energy characteristics--have emerged as a promising approach to …
performance, and energy characteristics--have emerged as a promising approach to …
Learning-based run-time power and energy management of multi/many-core systems: Current and future trends
Multi/Many-core systems are prevalent in several application domains targeting different
scales of computing such as embedded and cloud computing. These systems are able to …
scales of computing such as embedded and cloud computing. These systems are able to …
SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores
To meet the performance and energy efficiency demands of emerging complex and variable
workloads, heterogeneous many-core architectures are increasingly being deployed …
workloads, heterogeneous many-core architectures are increasingly being deployed …