Matrix engines for high performance computing: A paragon of performance or grasping at straws? J Domke, E Vatai, A Drozd, P ChenT, Y Oyama, L Zhang, S Salaria, ... 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 36 | 2021 |
DGEMM using tensor cores, and its accurate and reproducible versions D Mukunoki, K Ozaki, T Ogita, T Imamura International Conference on High Performance Computing, 230-248, 2020 | 26 | 2020 |
Reproducible BLAS routines with tunable accuracy using ozaki scheme for many-core architectures D Mukunoki, T Ogita, K Ozaki Parallel Processing and Applied Mathematics: 13th International Conference …, 2020 | 26 | 2020 |
Optimization of sparse matrix-vector multiplication for CRS format on NVIDIA Kepler architecture GPUs D Mukunoki, D Takahashi Computational Science and Its Applications–ICCSA 2013: 13th International …, 2013 | 24 | 2013 |
Implementation and evaluation of triple precision BLAS subroutines on GPUs D Mukunoki, D Takahashi 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 20 | 2012 |
Performance and energy consumption of accurate and mixed-precision linear algebra kernels on GPUs D Mukunoki, T Ogita Journal of Computational and Applied Mathematics 372, 112701, 2020 | 16 | 2020 |
Implementation and evaluation of quadruple precision BLAS functions on GPUs D Mukunoki, D Takahashi Applied Parallel and Scientific Computing: 10th International Conference …, 2012 | 16 | 2012 |
Fast implementation of general matrix-vector multiplication (GEMV) on Kepler GPUs D Mukunoki, T Imamura, D Takahashi 2015 23rd Euromicro International Conference on Parallel, Distributed, and …, 2015 | 15 | 2015 |
Using quadruple precision arithmetic to accelerate krylov subspace methods on gpus D Mukunoki, D Takahashi Parallel Processing and Applied Mathematics: 10th International Conference …, 2014 | 14 | 2014 |
Reduced-precision floating-point formats on GPUs for high performance and energy efficient computation D Mukunoki, T Imamura 2016 IEEE International Conference on Cluster Computing (CLUSTER), 144-145, 2016 | 11 | 2016 |
Automatic thread-block size adjustment for memory-bound BLAS kernels on GPUs D Mukunoki, T Imamura, D Takahashi 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core …, 2016 | 8 | 2016 |
Accurate matrix multiplication on binary128 format accelerated by ozaki scheme D Mukunoki, K Ozaki, T Ogita, T Imamura Proceedings of the 50th International Conference on Parallel Processing, 1-11, 2021 | 7 | 2021 |
Conjugate gradient solvers with high accuracy and bit-wise reproducibility between CPU and GPU using Ozaki scheme D Mukunoki, K Ozaki, T Ogita, R Iakymchuk The International Conference on High Performance Computing in Asia-Pacific …, 2021 | 7 | 2021 |
Performance comparison of double, triple and quadruple precision real and complex blas subroutines on gpus D Mukunoki, D Takahashi Proceedings of the ATIP/A* CRC Workshop on Accelerator Technologies for High …, 2012 | 6 | 2012 |
GPU における 3 倍・4 倍精度浮動小数点演算の実現と性能評価 椋木大地, 高橋大介 情報処理学会論文誌コンピューティングシステム (ACS) 6 (1), 66-77, 2013 | 5 | 2013 |
Can we avoid rounding-error estimation in HPC codes and still get trustworthy results? F Jézéquel, S Graillat, D Mukunoki, T Imamura, R Iakymchuk Software Verification: 12th International Conference, VSTTE 2020, and 13th …, 2020 | 4 | 2020 |
Minimal-precision computing for high-performance, energy-efficient, and reliable computations D Mukunoki, I Toshiyuki, Y Tan, A Koshiba, J Huthmann, K Sano, ... France-Japan-Germany trilateral workshop: Convergence of HPC and Data …, 2019 | 4 | 2019 |
Implementation and Performance Analysis of 2.5 D-PDGEMM on the K Computer D Mukunoki, T Imamura Parallel Processing and Applied Mathematics: 12th International Conference …, 2018 | 4 | 2018 |
Sparse Matrix-Vector Multiplication with Reduced-Precision Memory Accessor D Mukunoki, M Kawai, T Imamura 2023 IEEE 16th International Symposium on Embedded Multicore/Many-core …, 2023 | 3 | 2023 |
Can we avoid rounding-error estimation in HPC codes and still get trustful results? F Jézéquel, S Graillat, D Mukunoki, T Imamura, R Iakymchuk | 3 | 2020 |