A high-performance implementation of atomistic spin dynamics simulations on x86 CPUs

S Wu, Y Zhai, J Liu, J Huang, Z Jian, B Wong… - Proceedings of the 37th …, 2023 - dl.acm.org

General Matrix Multiplication (GEMM) is a crucial algorithm for various applications such as
machine learning and scientific computing since an efficient GEMM implementation is …

被引用次数：9 相关文章所有 3 个版本

[PDF] ucr.edu

FT-BLAS: A Fault Tolerant High Performance BLAS Implementation on x86 CPUs

Y Zhai, E Giem, K Zhao, J Liu, J Huang… - … on Parallel and …, 2023 - ieeexplore.ieee.org

Basic Linear Algebra Subprograms (BLAS) serve as a foundational library for scientific
computing and machine learning. In this article, we present a new BLAS implementation, FT …

被引用次数：1 相关文章所有 6 个版本

[PDF] arxiv.org

Kernel fusion in atomistic spin dynamics simulations on Nvidia GPUs using tensor core

H Chen, S Chen, JJ Turner, A Feiguin - Journal of Computational Science, 2024 - Elsevier

In atomistic spin dynamics simulations, the time cost of constructing the space-and time-
displaced pair correlation function in real space increases quadratically as the number of …

被引用次数：1 相关文章所有 3 个版本

[PDF] escholarship.org

[图书][B] Architectural-Aware Performance Optimization: From the Foundational Math Library to Cutting-Edge Applications

Y Zhai - 2023 - search.proquest.com

Efficient performance is essential for deploying a system in the real world. This thesis
presents techniques for optimizing performance with an awareness of architecture for …

被引用次数：1 相关文章所有 2 个版本

[PDF] neu.edu

Machine Learning and High-Performance Computing in Numerical Simulation of Quantum Many-Body Systems

H Chen - 2024 - search.proquest.com

This thesis focuses on the development, application, and high-performance implementation
of numerical methods for simulating quantum many-body systems. In the first chapter, I …