[HTML][HTML] Safe automated refactoring for intelligent parallelization of Java 8 streams

R Khatchadourian, Y Tang, M Bagherzadeh - Science of Computer …, 2020 - Elsevier
Streaming APIs are becoming more pervasive in mainstream Object-Oriented programming
languages and platforms. For example, the Stream API introduced in Java 8 allows for …

Incremental flattening for nested data parallelism

T Henriksen, F Thorøe, M Elsman… - Proceedings of the 24th …, 2019 - dl.acm.org
Compilation techniques for nested-parallel applications that can adapt to hardware and
dataset characteristics are vital for unlocking the power of modern hardware. This paper …

Efficient tiled sparse matrix multiplication through matrix signatures

SE Kurt, A Sukumaran-Rajam… - … Conference for High …, 2020 - ieeexplore.ieee.org
Tiling is a key technique to reduce data movement in matrix computations. While tiling is well
understood and widely used for dense matrix/tensor computations, effective tiling of sparse …

Automatic annotation of tasks in structured code

P Ramos, G Souza, D Soares, G Araújo… - Proceedings of the 27th …, 2018 - dl.acm.org
This paper describes the design and implementation of a suit of static analyses and code
generation techniques to annotate programs with OpenMP pragmas for task parallelism …

Fast multiplication of random dense matrices with sparse matrices

T Liang, R Murray, A Buluç… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
This work focuses on accelerating the multiplication of a dense random matrix with a (fixed)
sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme …

Fast multiplication of random dense matrices with fixed sparse matrices

T Liang, R Murray, A Buluç, J Demmel - arXiv preprint arXiv:2310.15419, 2023 - arxiv.org
This work focuses on accelerating the multiplication of a dense random matrix with a (fixed)
sparse matrix, which is frequently used in sketching algorithms. We develop a novel scheme …

[PDF][PDF] Lecture Notes for the Software Track of the PMPH Course

CE Oancea - Programming Massively Parallel Hardware, 2018 - hjemmesider.diku.dk
We then will turn our attention to legacy-sequential code written in programming languages
such as C. In this context we study dependence analysis, as a tool for reasoning about loop …

Modeling Data Movement for Sparse Matrix and Tensor Computations

SE Kurt - 2022 - search.proquest.com
Sparse matrix and tensor computations are challenging to optimize. In contrast to dense
matrix/tensor computations, the pattern of data access is typically irregular for sparse …

A functional approach to accelerating Monte Carlo based american option pricing

WM Pawlak, M Elsman, CE Oancea - Proceedings of the 31st …, 2019 - dl.acm.org
We study the feasibility and performance efficiency of expressing a complex financial
numerical algorithm with high-level functional parallel constructs. The algorithm we …

Taskminer: Automatic identification of tasks

P Ramos, G Souza, G Leobas… - Proceedings of the XXII …, 2018 - dl.acm.org
This paper presents TaskMiner, a tool that automatically finds task parallelism in C code.
TaskMiner solves classic problems of irregular parallelism, such as finding the memory …