Mixed precision algorithms in numerical linear algebra

NJ Higham, T Mary - Acta Numerica, 2022 - cambridge.org
Today's floating-point arithmetic landscape is broader than ever. While scientific computing
has traditionally used single precision and double precision floating-point arithmetics, half …

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

H Ootomo, R Yokota - The International Journal of High …, 2022 - journals.sagepub.com
Tensor Core is a mixed-precision matrix–matrix multiplication unit on NVIDIA GPUs with a
theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor …

DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication

Y Lu, W Liu - Proceedings of the International Conference for High …, 2023 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) plays a key role in computational science and
engineering, graph processing, and machine learning applications. Much work on SpMV …

Matrix engines for high performance computing: A paragon of performance or grasping at straws?

J Domke, E Vatai, A Drozd, P ChenT… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Matrix engines or units, in different forms and affinities, are becoming a reality in modern
processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep …

Mixed precision low-rank approximations and their application to block low-rank LU factorization

P Amestoy, O Boiteau, A Buttari… - IMA Journal of …, 2023 - academic.oup.com
We introduce a novel approach to exploit mixed precision arithmetic for low-rank
approximations. Our approach is based on the observation that singular vectors associated …

[HTML][HTML] Numerical behavior of NVIDIA tensor cores

M Fasi, NJ Higham, M Mikaitis, S Pranesh - PeerJ Computer Science, 2021 - peerj.com
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are
hardware accelerators for mixed-precision matrix multiplication available on the Volta …

A comprehensive evaluation of novel AI accelerators for deep learning workloads

M Emani, Z Xie, S Raskar, V Sastry… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Scientific applications are increasingly adopting Artificial Intelligence (AI) techniques to
advance science. High-performance computing centers are evaluating emerging novel …

tcfft: A fast half-precision fft library for nvidia tensor cores

B Li, S Cheng, J Lin - 2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org
Mixed-precision computing becomes an inevitable trend for HPC and AI applications due to
the increasing using mixed-precision units such as NVIDIA Tensor Cores. Fast Fourier …

Matrix multiplication in multiword arithmetic: Error analysis and application to GPU tensor cores

M Fasi, NJ Higham, F Lopez, T Mary, M Mikaitis - SIAM Journal on Scientific …, 2023 - SIAM
In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower
precision matrices, and a matrix product is formed by multiplying the constituents in low …

DGEMM on integer matrix multiplication unit

H Ootomo, K Ozaki, R Yokota - The International Journal of …, 2024 - journals.sagepub.com
Deep learning hardware achieves high throughput and low power consumption by reducing
computing precision and specializing in matrix multiplication. For machine learning …