Solving acoustic boundary integral equations using high performance tile low-rank LU factorization
We design and develop a new high performance implementation of a fast direct LU-based
solver using low-rank approximations on massively parallel systems. The LU factorization is …
solver using low-rank approximations on massively parallel systems. The LU factorization is …
Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification
Neural architecture search (NAS), which automates the design of neural network (NN)
architectures for scientific datasets, requires significant computational resources and time …
architectures for scientific datasets, requires significant computational resources and time …
Programming heterogeneous architectures using hierarchical tasks
M Faverge, N Furmento, A Guermouche… - Concurrency and …, 2023 - Wiley Online Library
Task‐based systems have become popular due to their ability to utilize the computational
power of complex heterogeneous systems. A typical programming model used is the …
power of complex heterogeneous systems. A typical programming model used is the …
Tiled Algorithms for Efficient Task-Parallel ℌ-Matrix Solvers
In this paper, we describe and evaluate an extension of the CHAMELEON library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …
Tiled algorithms for efficient task-parallel h-matrix solvers
In this paper, we describe and evaluate an extension of the Chameleon library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …
On the use of hierarchical task for heterogeneous architectures
G Lucas - 2023 - theses.hal.science
In the last decades, the computing power of high-performance platforms has grown
exponentially at the expense of increased complexity. Programming such platforms to take …
exponentially at the expense of increased complexity. Programming such platforms to take …
O (N) distributed direct factorization of structured dense matrices using runtime systems.
Structured dense matrices result from boundary integral problems in electrostatics and
geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal …
geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal …
Futures for Dynamic Dependencies – Parallelizing the -LU Factorization
R Nather, C Fohry - Workshop on Asynchronous Many-Task Systems and …, 2024 - Springer
The LU factorization of hierarchical matrices (H-matrices) is a challenging problem for
efficient parallelization, due to complex dependency patterns. Previous research suggested …
efficient parallelization, due to complex dependency patterns. Previous research suggested …
2D static resource allocation for compressed linear algebra and communication constraints
O Beaumont, L Eyraud-Dubois… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org
This paper adresses static resource allocation problems for irregular distributed parallel
applications. More precisely, we focus on two classical tiled linear algebra kernels: the …
applications. More precisely, we focus on two classical tiled linear algebra kernels: the …
High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems
RM Alomairy - 2022 - repository.kaust.edu.sa
To leverage the extreme parallelism of emerging architectures, so that scientific applications
can fulfill their high fidelity and multi-physics potential while sustaining high efficiency …
can fulfill their high fidelity and multi-physics potential while sustaining high efficiency …