Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

End-to-end deep learning of optimization heuristics

C Cummins, P Petoumenos, Z Wang… - 2017 26th …, 2017 - ieeexplore.ieee.org
Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and
diversity of modern hardware and software. Machine learning is aproven technique for …

Adaptive sparse tiling for sparse matrix multiplication

C Hong, A Sukumaran-Rajam, I Nisa, K Singh… - Proceedings of the 24th …, 2019 - dl.acm.org
Tiling is a key technique for data locality optimization and is widely used in high-
performance implementations of dense matrix-matrix multiplication for multicore/manycore …

Introducing SLAMBench, a performance and accuracy benchmarking methodology for SLAM

L Nardi, B Bodin, MZ Zia, J Mawer… - … on robotics and …, 2015 - ieeexplore.ieee.org
Real-time dense computer vision and SLAM offer great potential for a new level of scene
modelling, tracking and real environmental interaction for many types of robot, but their high …

IR2VEC LLVM IR Based Scalable Program Embeddings

S VenkataKeerthy, R Aggarwal, S Jain… - ACM Transactions on …, 2020 - dl.acm.org
We propose IR2Vec, a Concise and Scalable encoding infrastructure to represent programs
as a distributed embedding in continuous space. This distributed embedding is obtained by …

Transpilers: A systematic mapping review of their usage in research and industry

A Bastidas Fuertes, M Pérez, J Meza Hormaza - Applied Sciences, 2023 - mdpi.com
Transpilers refer to a special type of compilation that takes source code and translates it into
target source code. This type of technique has been used for different types of …

Benchmarking machine learning methods for performance modeling of scientific applications

P Malakar, P Balaprakash… - 2018 IEEE/ACM …, 2018 - ieeexplore.ieee.org
Performance modeling is an important and active area of research in high-performance
computing (HPC). It helps in better job scheduling and also improves overall performance of …

Automatic optimization of thread-coarsening for graphics processors

A Magni, C Dubach, M O'Boyle - … of the 23rd international conference on …, 2014 - dl.acm.org
OpenCL has been designed to achieve functional portability across multi-core devices from
different vendors. However, the lack of a single cross-target optimizing compiler severely …

Integrating algorithmic parameters into benchmarking and design space exploration in 3D scene understanding

B Bodin, L Nardi, MZ Zia, H Wagstaff… - Proceedings of the …, 2016 - dl.acm.org
System designers typically use well-studied benchmarks to evaluate and improve new
architectures and compilers. We design tomorrow's systems based on yesterday's …

LTRF: Enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

M Sadrosadati, A Mirhosseini, SB Ehsani… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …