Performance-aware model for sparse matrix-matrix multiplication on the sunway taihulight supercomputer
General sparse matrix-sparse matrix multiplication (SpGEMM) is one of the fundamental
linear operations in a wide variety of scientific applications. To implement efficient SpGEMM …
linear operations in a wide variety of scientific applications. To implement efficient SpGEMM …
Caspmv: A customized and accelerative spmv framework for the sunway taihulight
The Sunway TaihuLight, equipped with 10 million cores, is currently the world's third fastest
supercomputer. SpMV is one of core algorithms in many high-performance computing …
supercomputer. SpMV is one of core algorithms in many high-performance computing …
swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …
SW_Qsim: A minimize-memory quantum simulator with high-performance on a new sunway supercomputer
F Li, X Liu, Y Liu, P Zhao, Y Yang, H Shang… - Proceedings of the …, 2021 - dl.acm.org
Classical simulation of quantum computation plays a critical role in numerical studies of
quantum algorithms and the validation of quantum devices. Here, we introduce SW_Qsim, a …
quantum algorithms and the validation of quantum devices. Here, we introduce SW_Qsim, a …
Parallelization and optimization of NSGA-II on sunway TaihuLight system
X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
C-testing and efficient fault localization for AI accelerators
A Chaudhuri, C Liu, X Fan… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Accelerators for machine learning [artificial intelligence (AI)] inferencing applications are
homogeneous designs composed of identical cores. Each core or processing element (PE) …
homogeneous designs composed of identical cores. Each core or processing element (PE) …
aeSpTV: An adaptive and efficient framework for sparse tensor-vector product kernel on a high-performance computing platform
Multi-dimensional, large-scale, and sparse data, which can be neatly represented by sparse
tensors, are increasingly used in various applications such as data analysis and machine …
tensors, are increasingly used in various applications such as data analysis and machine …
swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight
This paper reports our efforts on swCaffe, a high-efficient parallel framework for accelerating
deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest …
deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest …
Parallel optimization and application of unstructured sparse triangular solver on new generation of sunway architecture
Large-scale sparse linear equation solver plays an important role in both numerical
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …
xmath2. 0: a high-performance extended math library for sw26010-pro many-core processor
F Liu, W Ma, Y Zhao, D Chen, Y Hu, Q Lu… - CCF Transactions on …, 2023 - Springer
High performance extended math library is used by many scientific engineering and artificial
intelligence applications, which usually involves many common mathematical computations …
intelligence applications, which usually involves many common mathematical computations …