Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Efficient loop unrolling factor prediction algorithm using machine learning models

I Singh, SK Singh, R Singh… - 2022 3rd International …, 2022 - ieeexplore.ieee.org
Loop unrolling is one of the prominent loop transformation techniques. It is used to increase
speed by replicating the loop body and decreasing branches. Loop unrolling is based on the …

Unifying primary cache, scratch, and register file memories in a throughput processor

M Gebhart, SW Keckler, B Khailany… - 2012 45th Annual …, 2012 - ieeexplore.ieee.org
Modern throughput processors such as GPUs employ thousands of threads to drive high-
bandwidth, long-latency memory systems. These threads require substantial on-chip storage …

Gnnmark: A benchmark suite to characterize graph neural network training on gpus

T Baruah, K Shivdikar, S Dong, Y Sun… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …

A computational model of neural contour processing: Figure-ground segregation and illusory contours

F Heitger, R von der Heydt… - Proceedings of PerAc'94 …, 1994 - ieeexplore.ieee.org
We present a computational model of contour processing that was suggested by
neurophysiological recordings from the monkey visual cortex. The model employs …

LTRF: Enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

M Sadrosadati, A Mirhosseini, SB Ehsani… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

Automatic generation of warp-level primitives and atomic instructions for fast and portable parallel reduction on GPUs

SG De Gonzalo, S Huang… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
Since the advent of GPU computing, GPU hardware has evolved at a fast pace. Since
application performance heavily depends on the latest hardware improvements …

Fine-grained network decomposition for massively parallel electromagnetic transient simulation of large power systems

Z Zhou, V Dinavahi - IEEE Power and Energy Technology …, 2017 - ieeexplore.ieee.org
Electromagnetic transient (EMT) simulation is one of the most complex power system studies
that requires detailed modeling of the study system including all frequency-dependent and …

Enabling Accelerators for Graph Computing

K Shivdikar - 2024 - search.proquest.com
Abstract The advent of Graph Neural Networks (GNNs) has revolutionized the field of
machine learning, offering a novel paradigm for learning on graph-structured data. Unlike …