Benchmarking TPU, GPU, and CPU platforms for deep learning

YE Wang, GY Wei, D Brooks - arXiv preprint arXiv:1907.10701, 2019 - arxiv.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware specialization to improve performance. To systematically benchmark …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Fathom: Reference workloads for modern deep learning methods

R Adolf, S Rama, B Reagen, GY Wei… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Deep learning has been popularized by its recent successes on challenging artificial
intelligence problems. One of the reasons for its dominance is also an ongoing challenge …

REVAMP: A systematic framework for heterogeneous CGRA realization

TK Bandara, D Wijerathne, T Mitra, LS Peh - Proceedings of the 27th …, 2022 - dl.acm.org
Coarse-Grained Reconfigurable Architectures (CGRAs) provide an excellent balance
between performance, energy efficiency, and flexibility. However, increasingly sophisticated …

A systematic methodology for analysis of deep learning hardware and software platforms

Y Wang, GY Wei, D Brooks - Proceedings of Machine …, 2020 - proceedings.mlsys.org
Training deep learning models is compute-intensive and there is an industry-wide trend
towards hardware and software specialization to improve performance. To systematically …

Dynamic resource management of heterogeneous mobile platforms via imitation learning

SK Mandal, G Bhat, CA Patil, JR Doppa… - … Transactions on Very …, 2019 - ieeexplore.ieee.org
The complexity of heterogeneous mobile platforms is growing at a rate faster than our ability
to manage them optimally at runtime. For example, state-of-the-art systems-on-chip (SoCs) …

The accelerator wall: Limits of chip specialization

A Fuchs, D Wentzlaff - 2019 IEEE International Symposium on …, 2019 - ieeexplore.ieee.org
Specializing chips using hardware accelerators has become the prime means to alleviate
the gap between the growing computational demands and the stagnating transistor budgets …

Dypo: Dynamic pareto-optimal configuration selection for heterogeneous mpsocs

U Gupta, CA Patil, G Bhat, P Mishra… - ACM Transactions on …, 2017 - dl.acm.org
Modern multiprocessor systems-on-chip (MpSoCs) offer tremendous power and
performance optimization opportunities by tuning thousands of potential voltage, frequency …

An energy-aware online learning framework for resource management in heterogeneous platforms

SK Mandal, G Bhat, JR Doppa, PP Pande… - ACM Transactions on …, 2020 - dl.acm.org
Mobile platforms must satisfy the contradictory requirements of fast response time and
minimum energy consumption as a function of dynamically changing applications. To …

A deep Q-learning approach for dynamic management of heterogeneous processors

U Gupta, SK Mandal, M Mao… - IEEE Computer …, 2019 - ieeexplore.ieee.org
Heterogeneous multiprocessor system-on-chips (SoCs) provide a wide range of parameters
that can be managed dynamically. For example, one can control the type (big/little), number …