SMAUG: End-to-end full-stack simulation infrastructure for deep learning workloads
In recent years, there has been tremendous advances in hardware acceleration of deep
neural networks. However, most of the research has focused on optimizing accelerator …
neural networks. However, most of the research has focused on optimizing accelerator …
Deadline-aware offloading for high-throughput accelerators
TT Yeh, MD Sinclair, BM Beckmann… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and
increasingly are being considered for latency-sensitive applications in datacenters …
increasingly are being considered for latency-sensitive applications in datacenters …
Predict; don't react for enabling efficient fine-grain dvfs in gpus
S Bharadwaj, S Das, K Mazumdar… - Proceedings of the 28th …, 2023 - dl.acm.org
With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast,
adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have …
adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have …
DUB: Dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads
S Bharadwaj, S Das, Y Eckert, M Oskin… - Proceedings of the 15th …, 2021 - dl.acm.org
The performance of graphics processing units (GPU) workloads can be sensitive to the
various clock domains which are dynamically tunable in modern GPUs. In this work, we …
various clock domains which are dynamically tunable in modern GPUs. In this work, we …
MiCache: An MSHR-inclusive Non-blocking Cache Design for FPGAs
S Xu, S Lu, Z Shao, X Liao, H Jin - Proceedings of the 2024 ACM/SIGDA …, 2024 - dl.acm.org
On FPGAs, customizing data parallelism can significantly improve performances of
applications. However, a large number of applications, such as sparse matrix multiplication …
applications. However, a large number of applications, such as sparse matrix multiplication …
Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUs
Graphic Processing Units (GPUs) are known for the large computing capabilities they offer
users compared to traditional CPUs. However, the issue of resource under-utilization is …
users compared to traditional CPUs. However, the issue of resource under-utilization is …
DELTA: Validate GPU Memory Profiling with Microbenchmarks
X Zhang, E Shcherbakov - … of the International Symposium on Memory …, 2020 - dl.acm.org
With the advent of GPU computing, profiling tools are now widely used to assist developers
in identifying and solving performance bottlenecks. Those tools are commonly relying on …
in identifying and solving performance bottlenecks. Those tools are commonly relying on …
[PDF][PDF] Evaluating pseudo-random SRAM for AI applications in GPU cache
K ASARE - 2024 - lup.lub.lu.se
Abstract General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
processor for AI/ML and other large computational problems because parallel processing …
processor for AI/ML and other large computational problems because parallel processing …
Benchmarking, Profiling and White-Box Performance Modeling for DNN Training
H Zhu - 2022 - search.proquest.com
Training a modern deep learning model is extremely time-consuming. The
software/hardware deployments that machine learning (ML) programmers use in practice …
software/hardware deployments that machine learning (ML) programmers use in practice …
Milestone M6 Report: Reducing Excess Data Movement Part 1
This is the second in a sequence of three Hardware Evaluation milestones that provide
insight into the following questions: What are the sources of excess data movement across …
insight into the following questions: What are the sources of excess data movement across …