SMAUG: End-to-end full-stack simulation infrastructure for deep learning workloads

S Xi, Y Yao, K Bhardwaj, P Whatmough… - ACM Transactions on …, 2020 - dl.acm.org
In recent years, there has been tremendous advances in hardware acceleration of deep
neural networks. However, most of the research has focused on optimizing accelerator …

Deadline-aware offloading for high-throughput accelerators

TT Yeh, MD Sinclair, BM Beckmann… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and
increasingly are being considered for latency-sensitive applications in datacenters …

Predict; don't react for enabling efficient fine-grain dvfs in gpus

S Bharadwaj, S Das, K Mazumdar… - Proceedings of the 28th …, 2023 - dl.acm.org
With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast,
adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have …

DUB: Dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads

S Bharadwaj, S Das, Y Eckert, M Oskin… - Proceedings of the 15th …, 2021 - dl.acm.org
The performance of graphics processing units (GPU) workloads can be sensitive to the
various clock domains which are dynamically tunable in modern GPUs. In this work, we …

MiCache: An MSHR-inclusive Non-blocking Cache Design for FPGAs

S Xu, S Lu, Z Shao, X Liao, H Jin - Proceedings of the 2024 ACM/SIGDA …, 2024 - dl.acm.org
On FPGAs, customizing data parallelism can significantly improve performances of
applications. However, a large number of applications, such as sparse matrix multiplication …

Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUs

T Adufu, J Ha, Y Kim - … 2024-2024 IEEE Network Operations and …, 2024 - ieeexplore.ieee.org
Graphic Processing Units (GPUs) are known for the large computing capabilities they offer
users compared to traditional CPUs. However, the issue of resource under-utilization is …

DELTA: Validate GPU Memory Profiling with Microbenchmarks

X Zhang, E Shcherbakov - … of the International Symposium on Memory …, 2020 - dl.acm.org
With the advent of GPU computing, profiling tools are now widely used to assist developers
in identifying and solving performance bottlenecks. Those tools are commonly relying on …

[PDF][PDF] Evaluating pseudo-random SRAM for AI applications in GPU cache

K ASARE - 2024 - lup.lub.lu.se
Abstract General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
processor for AI/ML and other large computational problems because parallel processing …

Benchmarking, Profiling and White-Box Performance Modeling for DNN Training

H Zhu - 2022 - search.proquest.com
Training a modern deep learning model is extremely time-consuming. The
software/hardware deployments that machine learning (ML) programmers use in practice …

Milestone M6 Report: Reducing Excess Data Movement Part 1

I Peng, GR Voskuilen, A Sarkar, D Boehme, R Long… - 2021 - osti.gov
This is the second in a sequence of three Hardware Evaluation milestones that provide
insight into the following questions: What are the sources of excess data movement across …