[图书][B] High performance visualization: Enabling extreme-scale scientific insight
Visualization and analysis tools, techniques, and algorithms have undergone a rapid
evolution in recent decades to accommodate explosive growth in data size and complexity …
evolution in recent decades to accommodate explosive growth in data size and complexity …
Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms
E Solomonik, J Demmel - European Conference on Parallel Processing, 2011 - Springer
Extra memory allows parallel matrix multiplication to be done with asymptotically less
communication than Cannon's algorithm and be faster in practice.“3D” algorithms arrange …
communication than Cannon's algorithm and be faster in practice.“3D” algorithms arrange …
Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction
RL Graham, D Bureddy, P Lui… - … in HPC (COMHPC), 2016 - ieeexplore.ieee.org
Increased system size and a greater reliance on utilizing system parallelism to achieve
computational needs, requires innovative system architectures to meet the simulation …
computational needs, requires innovative system architectures to meet the simulation …
Compass: A scalable simulator for an architecture for cognitive computing
Inspired by the function, power, and volume of the organic brain, we are developing
TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture …
TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture …
GASNet-EX: A high-performance, portable communication library for exascale
D Bonachea, PH Hargrove - … Workshop on Languages and Compilers for …, 2018 - Springer
Abstract Partitioned Global Address Space (PGAS) models, typified by languages such as
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …
Experiences with a lightweight supercomputer kernel: Lessons learned from Blue Gene's CNK
M Giampapa, T Gooding, T Inglett… - SC'10: Proceedings …, 2010 - ieeexplore.ieee.org
The Petascale era has recently been ushered in and many researchers have already turned
their attention to the challenges of exascale computing. To achieve petascale computing two …
their attention to the challenges of exascale computing. To achieve petascale computing two …
A configurable algorithm for parallel image-compositing applications
Collective communication operations can dominate the cost of large-scale parallel
algorithms. Image compositing in parallel scientific visualization is a reduction operation …
algorithms. Image compositing in parallel scientific visualization is a reduction operation …
MPI collective communications on the Blue Gene/P supercomputer: Algorithms and optimizations
The IBM Blue Gene/P (BG/P) system is a massively parallel supercomputer succeeding
BG/L, and it is based on orders of magnitude in system size and significant power …
BG/L, and it is based on orders of magnitude in system size and significant power …
AM++ a generalized active message framework
JJ Willcock, T Hoefler, NG Edmonds… - Proceedings of the 19th …, 2010 - dl.acm.org
Active messages have proven to be an effective approach for certain communication
problems in high performance computing. Many MPI implementations, as well as runtimes …
problems in high performance computing. Many MPI implementations, as well as runtimes …
Scalable communication protocols for dynamic sparse data exchange
T Hoefler, C Siebert, A Lumsdaine - ACM Sigplan Notices, 2010 - dl.acm.org
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with
distinct computation and communication phases. Although the communication phase in …
distinct computation and communication phases. Although the communication phase in …