Optimizing GPU cache policies for MI workloads

S Xi, Y Yao, K Bhardwaj, P Whatmough… - ACM Transactions on …, 2020 - dl.acm.org

In recent years, there has been tremendous advances in hardware acceleration of deep
neural networks. However, most of the research has focused on optimizing accelerator …

被引用次数：57 相关文章所有 6 个版本

[PDF] purdue.edu

Deadline-aware offloading for high-throughput accelerators

TT Yeh, MD Sinclair, BM Beckmann… - … Symposium on High …, 2021 - ieeexplore.ieee.org

Contemporary GPUs are widely used for throughput-oriented data-parallel workloads and
increasingly are being considered for latency-sensitive applications in datacenters …

被引用次数：14 相关文章所有 10 个版本

[PDF] acm.org

Predict; don't react for enabling efficient fine-grain dvfs in gpus

S Bharadwaj, S Das, K Mazumdar… - Proceedings of the 28th …, 2023 - dl.acm.org

With the continuous improvement of on-chip integrated voltage regulators (IVRs) and fast,
adaptive frequency control, dynamic voltage-frequency scaling (DVFS) transition times have …

被引用次数：6 相关文章所有 4 个版本

DUB: Dynamic underclocking and bypassing in nocs for heterogeneous GPU workloads

S Bharadwaj, S Das, Y Eckert, M Oskin… - Proceedings of the 15th …, 2021 - dl.acm.org

The performance of graphics processing units (GPU) workloads can be sensitive to the
various clock domains which are dynamically tunable in modern GPUs. In this work, we …

被引用次数：6 相关文章所有 2 个版本

[PDF] archive.org

MiCache: An MSHR-inclusive Non-blocking Cache Design for FPGAs

S Xu, S Lu, Z Shao, X Liao, H Jin - Proceedings of the 2024 ACM/SIGDA …, 2024 - dl.acm.org

On FPGAs, customizing data parallelism can significantly improve performances of
applications. However, a large number of applications, such as sparse matrix multiplication …

Application-aware Resource Sharing using Software and Hardware Partitioning on Modern GPUs

T Adufu, J Ha, Y Kim - … 2024-2024 IEEE Network Operations and …, 2024 - ieeexplore.ieee.org

Graphic Processing Units (GPUs) are known for the large computing capabilities they offer
users compared to traditional CPUs. However, the issue of resource under-utilization is …

[PDF] github.io

DELTA: Validate GPU Memory Profiling with Microbenchmarks

X Zhang, E Shcherbakov - … of the International Symposium on Memory …, 2020 - dl.acm.org

With the advent of GPU computing, profiling tools are now widely used to assist developers
in identifying and solving performance bottlenecks. Those tools are commonly relying on …

被引用次数：3 相关文章所有 4 个版本

[PDF] lu.se

[PDF][PDF] Evaluating pseudo-random SRAM for AI applications in GPU cache

K ASARE - 2024 - lup.lub.lu.se

Abstract General Purpose Graphics Processing Units (GPGPUs) have become the prevalent
processor for AI/ML and other large computational problems because parallel processing …

Benchmarking, Profiling and White-Box Performance Modeling for DNN Training

H Zhu - 2022 - search.proquest.com

Training a modern deep learning model is extremely time-consuming. The
software/hardware deployments that machine learning (ML) programmers use in practice …

Milestone M6 Report: Reducing Excess Data Movement Part 1

I Peng, GR Voskuilen, A Sarkar, D Boehme, R Long… - 2021 - osti.gov

This is the second in a sequence of three Hardware Evaluation milestones that provide
insight into the following questions: What are the sources of excess data movement across …