Quantum-centric supercomputing for materials science: A perspective on challenges and future directions
Computational models are an essential tool for the design, characterization, and discovery
of novel materials. Computationally hard tasks in materials science stretch the limits of …
of novel materials. Computationally hard tasks in materials science stretch the limits of …
MIMD Programs Execution Support on SIMD Machines: A Holistic Survey
D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …
A performance analysis of modern parallel programming models using a compute-bound application
Performance portability is becoming more-and-more important as next-generation high
performance computing systems grow increasingly diverse and heterogeneous. Several …
performance computing systems grow increasingly diverse and heterogeneous. Several …
oneAPI open-source math library interface
M Krainiuk, M Goli, VR Pascuzzi - 2021 International Workshop …, 2021 - ieeexplore.ieee.org
To HPC and AI analytics engineers, math primitives such as basic linear algebra
subprograms or random number generators are key functionality that have highly optimized …
subprograms or random number generators are key functionality that have highly optimized …
Under the hood of sycl–an initial performance analysis with an unstructured-mesh cfd application
As the computing hardware landscape gets more diverse, and the complexity of hardware
grows, the need for a general purpose parallel programming model capable of developing …
grows, the need for a general purpose parallel programming model capable of developing …
AXI4MLIR: User-Driven Automatic Host Code Generation for Custom AXI-Based Accelerators
This paper addresses the need for automatic and efficient generation of host driver code for
arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important …
arbitrary custom AXI-based accelerators targeting linear algebra algorithms, an important …
Performance portable Vlasov code with C++ parallel algorithm
This paper presents the performance portable implementation of a kinetic plasma simulation
code with C++ parallel algorithm to run across multiple CPUs and GPUs. Relying on the …
code with C++ parallel algorithm to run across multiple CPUs and GPUs. Relying on the …
Sylkan: towards a Vulkan compute target platform for SYCL
P Thoman, D Gogl, T Fahringer - … of the 9th International Workshop on …, 2021 - dl.acm.org
SYCL is a modern high-level C++ programming interface which excels at expressing data
parallelism for heterogeneous hardware platforms in a programmer-friendly way, and is …
parallelism for heterogeneous hardware platforms in a programmer-friendly way, and is …
The celerity high-level api: C++ 20 for accelerator clusters
P Thoman, F Tischler, P Salzmann… - International Journal of …, 2022 - Springer
Providing convenient APIs and notations for data parallelism which remain accessible for
programmers while still providing good performance has been a long-term goal of …
programmers while still providing good performance has been a long-term goal of …
[PDF][PDF] First SYCL implementation of the three-dimensional subsurface XCA-Flow cellular automaton and performance comparison against CUDA
D D'Ambrosio, G Terremoto, A De Rango… - Proceedings of the …, 2022 - computing-conf.org
We present the results of a first SYCL vs CUDA performance assessment for the case of the
three-dimensional XCA-Flow subsurface Extended Cellular Automata model. A grid domain …
three-dimensional XCA-Flow subsurface Extended Cellular Automata model. A grid domain …