DGEMM using tensor cores, and its accurate and reproducible versions

NJ Higham, T Mary - Acta Numerica, 2022 - cambridge.org

Today's floating-point arithmetic landscape is broader than ever. While scientific computing
has traditionally used single precision and double precision floating-point arithmetics, half …

被引用次数：92 相关文章所有 17 个版本

[HTML] sagepub.com

Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance

H Ootomo, R Yokota - The International Journal of High …, 2022 - journals.sagepub.com

Tensor Core is a mixed-precision matrix–matrix multiplication unit on NVIDIA GPUs with a
theoretical peak performance of more than 300 TFlop/s on Ampere architectures. Tensor …

被引用次数：34 相关文章所有 7 个版本

[PDF] ssslab.cn

DASP: Specific Dense Matrix Multiply-Accumulate Units Accelerated General Sparse Matrix-Vector Multiplication

Y Lu, W Liu - Proceedings of the International Conference for High …, 2023 - dl.acm.org

Sparse matrix-vector multiplication (SpMV) plays a key role in computational science and
engineering, graph processing, and machine learning applications. Much work on SpMV …

被引用次数：6 相关文章所有 4 个版本

[PDF] arxiv.org

Matrix engines for high performance computing: A paragon of performance or grasping at straws?

J Domke, E Vatai, A Drozd, P ChenT… - 2021 IEEE …, 2021 - ieeexplore.ieee.org

Matrix engines or units, in different forms and affinities, are becoming a reality in modern
processors; CPUs and otherwise. The current and dominant algorithmic approach to Deep …

被引用次数：38 相关文章所有 7 个版本

[PDF] hal.science

Mixed precision low-rank approximations and their application to block low-rank LU factorization

P Amestoy, O Boiteau, A Buttari… - IMA Journal of …, 2023 - academic.oup.com

We introduce a novel approach to exploit mixed precision arithmetic for low-rank
approximations. Our approach is based on the observation that singular vectors associated …

被引用次数：25 相关文章所有 28 个版本

[HTML] peerj.com

[HTML][HTML] Numerical behavior of NVIDIA tensor cores

M Fasi, NJ Higham, M Mikaitis, S Pranesh - PeerJ Computer Science, 2021 - peerj.com

We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are
hardware accelerators for mixed-precision matrix multiplication available on the Volta …

被引用次数：36 相关文章所有 15 个版本

[PDF] google.com

A comprehensive evaluation of novel AI accelerators for deep learning workloads

M Emani, Z Xie, S Raskar, V Sastry… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org

Scientific applications are increasingly adopting Artificial Intelligence (AI) techniques to
advance science. High-performance computing centers are evaluating emerging novel …

被引用次数：12 相关文章所有 4 个版本

tcfft: A fast half-precision fft library for nvidia tensor cores

B Li, S Cheng, J Lin - 2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org

Mixed-precision computing becomes an inevitable trend for HPC and AI applications due to
the increasing using mixed-precision units such as NVIDIA Tensor Cores. Fast Fourier …

被引用次数：17 相关文章所有 3 个版本

[PDF] hal.science

Matrix multiplication in multiword arithmetic: Error analysis and application to GPU tensor cores

M Fasi, NJ Higham, F Lopez, T Mary, M Mikaitis - SIAM Journal on Scientific …, 2023 - SIAM

In multiword arithmetic, a matrix is represented as the unevaluated sum of two or more lower
precision matrices, and a matrix product is formed by multiplying the constituents in low …

被引用次数：14 相关文章所有 17 个版本

[PDF] arxiv.org

DGEMM on integer matrix multiplication unit

H Ootomo, K Ozaki, R Yokota - The International Journal of …, 2024 - journals.sagepub.com

Deep learning hardware achieves high throughput and low power consumption by reducing
computing precision and specializing in matrix multiplication. For machine learning …

被引用次数：3 相关文章所有 3 个版本