Solving acoustic boundary integral equations using high performance tile low-rank LU factorization

N Al-Harthi, R Alomairy, K Akbudak, R Chen… - … Conference, ISC High …, 2020 - Springer
We design and develop a new high performance implementation of a fast direct LU-based
solver using low-rank approximations on massively parallel systems. The LU factorization is …

Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification

G Channing, R Patel, P Olaya, A Rorabaugh… - Proceedings of the …, 2023 - dl.acm.org
Neural architecture search (NAS), which automates the design of neural network (NN)
architectures for scientific datasets, requires significant computational resources and time …

Programming heterogeneous architectures using hierarchical tasks

M Faverge, N Furmento, A Guermouche… - Concurrency and …, 2023 - Wiley Online Library
Task‐based systems have become popular due to their ability to utilize the computational
power of complex heterogeneous systems. A typical programming model used is the …

Tiled Algorithms for Efficient Task-Parallel ℌ-Matrix Solvers

R Carratalá-Sáez, M Faverge, G Pichon… - 2020 IEEE …, 2020 - ieeexplore.ieee.org
In this paper, we describe and evaluate an extension of the CHAMELEON library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …

Tiled algorithms for efficient task-parallel h-matrix solvers

R Carratalá-Sáez, M Faverge, G Pichon, G Sylvand… - 2020 - inria.hal.science
In this paper, we describe and evaluate an extension of the Chameleon library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …

On the use of hierarchical task for heterogeneous architectures

G Lucas - 2023 - theses.hal.science
In the last decades, the computing power of high-performance platforms has grown
exponentially at the expense of increased complexity. Programming such platforms to take …

O (N) distributed direct factorization of structured dense matrices using runtime systems.

S Deshmukh, R Yokota, G Bosilca, Q Ma - Proceedings of the 52nd …, 2023 - dl.acm.org
Structured dense matrices result from boundary integral problems in electrostatics and
geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal …

Futures for Dynamic Dependencies – Parallelizing the -LU Factorization

R Nather, C Fohry - Workshop on Asynchronous Many-Task Systems and …, 2024 - Springer
The LU factorization of hierarchical matrices (H-matrices) is a challenging problem for
efficient parallelization, due to complex dependency patterns. Previous research suggested …

2D static resource allocation for compressed linear algebra and communication constraints

O Beaumont, L Eyraud-Dubois… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org
This paper adresses static resource allocation problems for irregular distributed parallel
applications. More precisely, we focus on two classical tiled linear algebra kernels: the …

High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems

RM Alomairy - 2022 - repository.kaust.edu.sa
To leverage the extreme parallelism of emerging architectures, so that scientific applications
can fulfill their high fidelity and multi-physics potential while sustaining high efficiency …