Snake: A variable-length chain-based prefetching for gpus

S Mostofi, H Falahati, N Mahani… - Proceedings of the 56th …, 2023 - dl.acm.org
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …

OSM: Off-chip shared memory for GPUs

S Darabi, E Yousefzadeh-Asl-Miandoab… - … on Parallel and …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) employ a shared memory, a software-managed cache for
programmers, in each streaming multiprocessor to accelerate data sharing among the …

CV32RT: Enabling Fast Interrupt and Context Switching for RISC-V Microcontrollers

R Balas, A Ottaviano, L Benini - IEEE Transactions on Very …, 2024 - ieeexplore.ieee.org
Processors using the open RISC-V instruction set architecture (ISA) are finding increasing
adoption in the embedded world. Many embedded use cases have real-time constraints and …

Adaptable register file organization for vector processors

CR Lazo, E Reggiani, CR Morales… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Contemporary Vector Processors (VPs) are de-signed either for short vector lengths, eg,
Fujitsu A64FX with 512-bit ARM SVE vector support, or long vectors, eg, NEC Aurora …

Cross-core data sharing for energy-efficient GPUs

H Falahati, M Sadrosadati, Q Xu… - ACM Transactions on …, 2024 - dl.acm.org
Graphics Processing Units (GPUs) are the accelerator of choice in a variety of application
domains, because they can accelerate massively parallel workloads and can be easily …

Investigating Register Cache Behavior: Implications for CUDA and Tensor Core Workloads on GPUs

V Geraeinejad, Q Qian… - IEEE Journal on Emerging …, 2024 - ieeexplore.ieee.org
GPUs are extensively employed as the primary devices for running a broad spectrum of
applications, covering general-purpose applications as well as Artificial Intelligence (AI) …

PresCount: Effective Register Allocation for Bank Conflict Reduction

X Guan, H Zhou, G Bao, H Li, L Zhu… - 2024 IEEE/ACM …, 2024 - ieeexplore.ieee.org
Modern processors with large multi-banked register files often rely on hardware solutions to
resolve bank conflicts efficiently. However, these hardware-based methods, while flexible …