[图书][B] High performance visualization: Enabling extreme-scale scientific insight

EW Bethel, H Childs, C Hansen - 2012 - books.google.com
Visualization and analysis tools, techniques, and algorithms have undergone a rapid
evolution in recent decades to accommodate explosive growth in data size and complexity …

Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms

E Solomonik, J Demmel - European Conference on Parallel Processing, 2011 - Springer
Extra memory allows parallel matrix multiplication to be done with asymptotically less
communication than Cannon's algorithm and be faster in practice.“3D” algorithms arrange …

Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction

RL Graham, D Bureddy, P Lui… - … in HPC (COMHPC), 2016 - ieeexplore.ieee.org
Increased system size and a greater reliance on utilizing system parallelism to achieve
computational needs, requires innovative system architectures to meet the simulation …

Compass: A scalable simulator for an architecture for cognitive computing

R Preissl, TM Wong, P Datta, M Flickner… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
Inspired by the function, power, and volume of the organic brain, we are developing
TrueNorth, a novel modular, non-von Neumann, ultra-low power, compact architecture …

GASNet-EX: A high-performance, portable communication library for exascale

D Bonachea, PH Hargrove - … Workshop on Languages and Compilers for …, 2018 - Springer
Abstract Partitioned Global Address Space (PGAS) models, typified by languages such as
Unified Parallel C (UPC) and Co-Array Fortran, expose one-sided communication as a key …

Experiences with a lightweight supercomputer kernel: Lessons learned from Blue Gene's CNK

M Giampapa, T Gooding, T Inglett… - SC'10: Proceedings …, 2010 - ieeexplore.ieee.org
The Petascale era has recently been ushered in and many researchers have already turned
their attention to the challenges of exascale computing. To achieve petascale computing two …

A configurable algorithm for parallel image-compositing applications

T Peterka, D Goodell, R Ross, HW Shen… - Proceedings of the …, 2009 - dl.acm.org
Collective communication operations can dominate the cost of large-scale parallel
algorithms. Image compositing in parallel scientific visualization is a reduction operation …

MPI collective communications on the Blue Gene/P supercomputer: Algorithms and optimizations

A Faraj, S Kumar, B Smith, A Mamidala… - Proceedings of the 23rd …, 2009 - dl.acm.org
The IBM Blue Gene/P (BG/P) system is a massively parallel supercomputer succeeding
BG/L, and it is based on orders of magnitude in system size and significant power …

AM++ a generalized active message framework

JJ Willcock, T Hoefler, NG Edmonds… - Proceedings of the 19th …, 2010 - dl.acm.org
Active messages have proven to be an effective approach for certain communication
problems in high performance computing. Many MPI implementations, as well as runtimes …

Scalable communication protocols for dynamic sparse data exchange

T Hoefler, C Siebert, A Lumsdaine - ACM Sigplan Notices, 2010 - dl.acm.org
Many large-scale parallel programs follow a bulk synchronous parallel (BSP) structure with
distinct computation and communication phases. Although the communication phase in …