A hierarchical fast direct solver for distributed memory machines with manycore nodes

N Al-Harthi, R Alomairy, K Akbudak, R Chen… - … Conference, ISC High …, 2020 - Springer

We design and develop a new high performance implementation of a fast direct LU-based
solver using low-rank approximations on massively parallel systems. The LU factorization is …

被引用次数：28 相关文章所有 7 个版本

[PDF] acm.org

Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Classification

G Channing, R Patel, P Olaya, A Rorabaugh… - Proceedings of the …, 2023 - dl.acm.org

Neural architecture search (NAS), which automates the design of neural network (NN)
architectures for scientific datasets, requires significant computational resources and time …

被引用次数：1 相关文章所有 2 个版本

[PDF] hal.science

Programming heterogeneous architectures using hierarchical tasks

M Faverge, N Furmento, A Guermouche… - Concurrency and …, 2023 - Wiley Online Library

Task‐based systems have become popular due to their ability to utilize the computational
power of complex heterogeneous systems. A typical programming model used is the …

被引用次数：8 相关文章所有 13 个版本

Tiled Algorithms for Efficient Task-Parallel ℌ-Matrix Solvers

R Carratalá-Sáez, M Faverge, G Pichon… - 2020 IEEE …, 2020 - ieeexplore.ieee.org

In this paper, we describe and evaluate an extension of the CHAMELEON library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …

被引用次数：7 相关文章所有 2 个版本

[PDF] hal.science

Tiled algorithms for efficient task-parallel h-matrix solvers

R Carratalá-Sáez, M Faverge, G Pichon, G Sylvand… - 2020 - inria.hal.science

In this paper, we describe and evaluate an extension of the Chameleon library to operate
with hierarchical matrices (H-Matrices) and hierarchical arithmetic (H-Arithmetic), producing …

被引用次数：7 相关文章所有 5 个版本

[PDF] hal.science

On the use of hierarchical task for heterogeneous architectures

G Lucas - 2023 - theses.hal.science

In the last decades, the computing power of high-performance platforms has grown
exponentially at the expense of increased complexity. Programming such platforms to take …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

O (N) distributed direct factorization of structured dense matrices using runtime systems.

S Deshmukh, R Yokota, G Bosilca, Q Ma - Proceedings of the 52nd …, 2023 - dl.acm.org

Structured dense matrices result from boundary integral problems in electrostatics and
geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal …

被引用次数：2 相关文章所有 3 个版本

Futures for Dynamic Dependencies – Parallelizing the -LU Factorization

R Nather, C Fohry - Workshop on Asynchronous Many-Task Systems and …, 2024 - Springer

The LU factorization of hierarchical matrices (H-matrices) is a challenging problem for
efficient parallelization, due to complex dependency patterns. Previous research suggested …

[PDF] hal.science

2D static resource allocation for compressed linear algebra and communication constraints

O Beaumont, L Eyraud-Dubois… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org

This paper adresses static resource allocation problems for irregular distributed parallel
applications. More precisely, we focus on two classical tiled linear algebra kernels: the …

被引用次数：2 相关文章所有 8 个版本

[PDF] kaust.edu.sa

High-Performance Scientific Applications Using Mixed Precision and Low-Rank Approximation Powered by Task-based Runtime Systems

RM Alomairy - 2022 - repository.kaust.edu.sa

To leverage the extreme parallelism of emerging architectures, so that scientific applications
can fulfill their high fidelity and multi-physics potential while sustaining high efficiency …