High performance interconnect network for Tianhe system

XK Liao, ZB Pang, KF Wang, YT Lu, M Xie, J Xia… - Journal of Computer …, 2015 - Springer
In this paper, we present the Tianhe-2 interconnect network and message passing services.
We describe the architecture of the router and network interface chips, and highlight a set of …

Efficient design for MPI asynchronous progress without dedicated resources

A Ruhela, H Subramoni, S Chakraborty, M Bayatpour… - Parallel Computing, 2019 - Elsevier
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

[PDF][PDF] The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC

DK Panda, K Tomko, K Schulz… - … with Int'l …, 2013 - pfigshare-u-files.s3.amazonaws.com
I. OVERVIEW OF THE MVAPICH PROJECT The MVAPICH (for MPI-1) and MVAPICH2 (for
MPI-2 and MPI-3) open-source libraries [?] have been designed and developed during the …

Energy, memory, and runtime tradeoffs for implementing collective communication operations

T Hoefler, D Moor - Supercomputing frontiers and innovations, 2014 - superfri.susu.ru
Collective operations are among the most important communication operations in shared-
and distributed-memory parallel applications. In this paper, we analyze the tradeoffs …

The TH Express high performance interconnect networks

Z Pang, M Xie, J Zhang, Y Zheng, G Wang… - Frontiers of Computer …, 2014 - Springer
Interconnection network plays an important role in scalable high performance computer
(HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to …

Efficient asynchronous communication progress for MPI without dedicated resources

A Ruhela, H Subramoni, S Chakraborty… - Proceedings of the 25th …, 2018 - dl.acm.org
The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

Exploiting offload enabled network interfaces

S Di Girolamo, P Jolivet… - 2015 IEEE 23rd …, 2015 - ieeexplore.ieee.org
Network interface cards are one of the key components to achieve efficient parallel
performance. In the past, they have gained new functionalities such as lossless …

Designing non-blocking allreduce with collective offload on InfiniBand clusters: A case study with conjugate gradient solvers

K Kandalla, U Yang, J Keasler, T Kolev… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Scientists across a wide range of domains increasingly rely on computer simulation for their
investigations. Such simulations often spend a majority of their run-times solving large …

Non-blocking PMI extensions for fast MPI startup

S Chakraborty, H Subramoni, A Moody… - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org
An efficient implementation of the Process Management Interface (PMI) is crucial to enable
fast start-up of MPI jobs. We propose three extensions to the PMI specification: 1) a blocking …

Gaps: a genetic programming system

MD Kramer, D Zhang - Proceedings 24th Annual International …, 2000 - ieeexplore.ieee.org
Genetic programming tackles the issue of how to automatically create a working computer
program for a given problem from some initial problem statement. The goal is accomplished …