Designing non-blocking broadcast with collective offload on infiniband clusters: A case study...

XK Liao, ZB Pang, KF Wang, YT Lu, M Xie, J Xia… - Journal of Computer …, 2015 - Springer

In this paper, we present the Tianhe-2 interconnect network and message passing services.
We describe the architecture of the router and network interface chips, and highlight a set of …

被引用次数：87 相关文章所有 8 个版本

[PDF] sciencedirect.com

Efficient design for MPI asynchronous progress without dedicated resources

A Ruhela, H Subramoni, S Chakraborty, M Bayatpour… - Parallel Computing, 2019 - Elsevier

The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

被引用次数：14 相关文章所有 3 个版本

[PDF] amazonaws.com

[PDF][PDF] The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC

DK Panda, K Tomko, K Schulz… - … with Int'l …, 2013 - pfigshare-u-files.s3.amazonaws.com

I. OVERVIEW OF THE MVAPICH PROJECT The MVAPICH (for MPI-1) and MVAPICH2 (for
MPI-2 and MPI-3) open-source libraries [?] have been designed and developed during the …

被引用次数：69 相关文章所有 3 个版本

[PDF] susu.ru

Energy, memory, and runtime tradeoffs for implementing collective communication operations

T Hoefler, D Moor - Supercomputing frontiers and innovations, 2014 - superfri.susu.ru

Collective operations are among the most important communication operations in shared-
and distributed-memory parallel applications. In this paper, we analyze the tradeoffs …

被引用次数：56 相关文章所有 28 个版本

[PDF] researchgate.net

The TH Express high performance interconnect networks

Z Pang, M Xie, J Zhang, Y Zheng, G Wang… - Frontiers of Computer …, 2014 - Springer

Interconnection network plays an important role in scalable high performance computer
(HPC) systems. The TH Express-2 interconnect has been used in MilkyWay-2 system to …

被引用次数：58 相关文章所有 7 个版本

[PDF] ohio-state.edu

Efficient asynchronous communication progress for MPI without dedicated resources

A Ruhela, H Subramoni, S Chakraborty… - Proceedings of the 25th …, 2018 - dl.acm.org

The overlap of computation and communication is critical for good performance of many
HPC applications. State-of-the-art designs for the asynchronous progress require specially …

被引用次数：26 相关文章所有 12 个版本

[PDF] unixer.de

Exploiting offload enabled network interfaces

S Di Girolamo, P Jolivet… - 2015 IEEE 23rd …, 2015 - ieeexplore.ieee.org

Network interface cards are one of the key components to achieve efficient parallel
performance. In the past, they have gained new functionalities such as lossless …

被引用次数：32 相关文章所有 37 个版本

[PDF] osti.gov

Designing non-blocking allreduce with collective offload on InfiniBand clusters: A case study with conjugate gradient solvers

K Kandalla, U Yang, J Keasler, T Kolev… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

Scientists across a wide range of domains increasingly rely on computer simulation for their
investigations. Such simulations often spend a majority of their run-times solving large …

被引用次数：41 相关文章所有 10 个版本

[PDF] souravc.com

Non-blocking PMI extensions for fast MPI startup

S Chakraborty, H Subramoni, A Moody… - 2015 15th IEEE/ACM …, 2015 - ieeexplore.ieee.org

An efficient implementation of the Process Management Interface (PMI) is crucial to enable
fast start-up of MPI jobs. We propose three extensions to the PMI specification: 1) a blocking …

被引用次数：15 相关文章所有 5 个版本

[PDF] researchgate.net

Gaps: a genetic programming system

MD Kramer, D Zhang - Proceedings 24th Annual International …, 2000 - ieeexplore.ieee.org

Genetic programming tackles the issue of how to automatically create a working computer
program for a given problem from some initial problem statement. The goal is accomplished …

被引用次数：33 相关文章所有 6 个版本