Multicore and accelerator development for a leadership-class stellar astrophysics code

A Abdelfattah, A Haidar, S Tomov… - … Conference, ISC High …, 2016 - Springer

The general matrix-matrix multiplication (GEMM) is the most important numerical kernel in
dense linear algebra, and is the key component for obtaining high performance in most …

被引用次数：142 相关文章所有 10 个版本

[PDF] sciencedirect.com

The design and performance of batched BLAS on modern high-performance computing systems

J Dongarra, S Hammarling, NJ Higham… - Procedia Computer …, 2017 - Elsevier

A current trend in high-performance computing is to decompose a large linear algebra
problem into batches containing thousands of smaller problems, that can be solved …

被引用次数：89 相关文章所有 11 个版本

[PDF] osti.gov

MiniApps derived from production HPC applications using multiple programing models

OEB Messer, E D'Azevedo, J Hill… - … Journal of High …, 2018 - journals.sagepub.com

We have developed a set of reduced, proxy applications (“MiniApps”) based on large-scale
application codes supported at the Oak Ridge Leadership Computing Facility (OLCF). The …

被引用次数：22 相关文章所有 3 个版本

[PDF] osti.gov

RETRACTED: Batched matrix computations on hardware accelerators based on GPUs

A Haidar, T Dong, P Luszczek… - … Journal of High …, 2015 - journals.sagepub.com

Scientific applications require solvers that work on many small size problems that are
independent from each other. At the same time, the high-end hardware evolves rapidly and …

被引用次数：79 相关文章所有 17 个版本

[PDF] acm.org

A set of batched basic linear algebra subprograms and LAPACK routines

A Abdelfattah, T Costa, J Dongarra, M Gates… - ACM Transactions on …, 2021 - dl.acm.org

This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms
(Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small …

被引用次数：27 相关文章所有 5 个版本

[PDF] utk.edu

LU factorization of small matrices: Accelerating batched DGETRF on the GPU

T Dong, A Haidar, P Luszczek, JA Harris… - 2014 IEEE Intl Conf …, 2014 - ieeexplore.ieee.org

Gaussian Elimination is commonly used to solve dense linear systems in scientific models.
In a large number of applications, a need arises to solve many small size problems, instead …

被引用次数：64 相关文章所有 7 个版本

[PDF] susu.ru

Parallel programming models for dense linear algebra on heterogeneous systems

J Dongarra, M Abalenkovs, A Abdelfattah… - Supercomputing …, 2015 - superfri.susu.ru

We present a review of the current best practices in parallel programming models for dense
linear algebra (DLA) on heterogeneous architectures. We consider multicore CPUs, stand …

被引用次数：62 相关文章所有 16 个版本

[PDF] nsf.gov

Addressing irregular patterns of matrix computations on GPUs and their impact on applications powered by sparse direct solvers

A Abdelfattah, P Ghysels, W Boukaram… - … Conference for High …, 2022 - ieeexplore.ieee.org

Many scientific applications rely on sparse direct solvers for their numerical robustness.
However, performance optimization for these solvers remains a challenging task, especially …

被引用次数：6 相关文章所有 9 个版本

[PDF] researchgate.net

A framework for batched and GPU-resident factorization algorithms applied to block householder transformations

A Haidar, TT Dong, S Tomov, P Luszczek… - … Conference, ISC High …, 2015 - Springer

As modern hardware keeps evolving, an increasingly effective approach to developing
energy efficient and high-performance solvers is to design them to work on many small size …

被引用次数：58 相关文章所有 5 个版本

[PDF] iccs-meeting.org

Batch QR factorization on GPUs: Design, optimization, and tuning

A Abdelfattah, S Tomov, J Dongarra - International Conference on …, 2022 - Springer

QR factorization of dense matrices is a ubiquitous tool in high performance computing
(HPC). From solving linear systems and least squares problems to eigenvalue problems …

被引用次数：9 相关文章所有 3 个版本