Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer

Y Chen, K Li, W Yang, G Xiao, X Xie… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental
linear operations in a wide variety of scientific applications. To implement efficient SpGEMM …

Caspmv: A customized and accelerative spmv framework for the sunway taihulight

G Xiao, K Li, Y Chen, W He… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
The Sunway TaihuLight, equipped with 10 million cores, is currently the world's third fastest
supercomputer. SpMV is one of core algorithms in many high-performance computing …

swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures

X Wang, W Liu, W Xue, L Wu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …

SW_Qsim: A minimize-memory quantum simulator with high-performance on a new sunway supercomputer

F Li, X Liu, Y Liu, P Zhao, Y Yang, H Shang… - Proceedings of the …, 2021 - dl.acm.org
Classical simulation of quantum computation plays a critical role in numerical studies of
quantum algorithms and the validation of quantum devices. Here, we introduce SW_Qsim, a …

Parallelization and optimization of NSGA-II on sunway TaihuLight system

X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …

C-testing and efficient fault localization for AI accelerators

A Chaudhuri, C Liu, X Fan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Accelerators for machine learning [artificial intelligence (AI)] inferencing applications are
homogeneous designs composed of identical cores. Each core or processing element (PE) …

aeSpTV: An adaptive and efficient framework for sparse tensor-vector product kernel on a high-performance computing platform

Y Chen, G Xiao, MT Özsu, C Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Multi-dimensional, large-scale, and sparse data, which can be neatly represented by sparse
tensors, are increasingly used in various applications such as data analysis and machine …

swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight

L Li, J Fang, H Fu, J Jiang, W Zhao… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
This paper reports our efforts on swCaffe, a high-efficient parallel framework for accelerating
deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest …

Parallel optimization and application of unstructured sparse triangular solver on new generation of sunway architecture

J Li, L Li, Q Wang, W Xue, J Liang, J Shi - Parallel Computing, 2024 - Elsevier
Large-scale sparse linear equation solver plays an important role in both numerical
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …

xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor

F Liu, W Ma, Y Zhao, D Chen, Y Hu, Q Lu… - CCF Transactions on …, 2023 - Springer
High performance extended math library is used by many scientific engineering and artificial
intelligence applications, which usually involves many common mathematical computations …