GPUnet: Networking abstractions for GPU programs

M Silberstein, S Kim, S Huh, X Zhang, Y Hu… - ACM Transactions on …, 2016 -
Despite the popularity of GPUs in high-performance and scientific computing, and despite
increasingly general-purpose hardware capabilities, the use of GPUs in network servers or …

Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers

A Heinecke, A Breuer, S Rettenberger… - SC'14: Proceedings …, 2014 -
We present an end-to-end optimization of the innovative Arbitrary high-order DERivative
Discontinuous Galerkin (ADER-DG) software SeisSol targeting Intel® Xeon Phi coprocessor …

Bluesmpi: Efficient mpi non-blocking alltoall offloading designs on modern bluefield smart nics

M Bayatpour, N Sarkauskas, H Subramoni… - … Conference on High …, 2021 - Springer
In the state-of-the-art production quality MPI (Message Passing Interface) libraries,
communication progress is either performed by the main thread or a separate …

A hierarchical and contextual model for aerial image parsing

J Porway, Q Wang, SC Zhu - International journal of computer vision, 2010 - Springer
In this paper we present a hierarchical and contextual model for aerial image understanding.
Our model organizes objects (cars, roofs, roads, trees, parking lots) in aerial scenes into …

Flexdriver: A network driver for your accelerator

H Eran, M Fudim, G Malka, G Shalom… - Proceedings of the 27th …, 2022 -
We propose a new system design for connecting hardware and FPGA accelerators to the
network, allowing the accelerator to directly control commodity Network Interface Cards …

[PDF][PDF] The MVAPICH project: Evolution and sustainability of an open source production quality MPI library for HPC

DK Panda, K Tomko, K Schulz… - … with Int'l …, 2013 -
MPI-2 and MPI-3) open-source libraries [?] have been designed and developed during the …

Amplification of probabilistic Boolean formulas

RB Boppana - 26th Annual Symposium on Foundations of …, 1985 -
The amplification of probabilistic Boolean formulas refers to combining independent copies
of such formulas to reduce the error probability. Les Valiant used the amplification method to …

Exploiting GPUDirect RDMA in designing high performance OpenSHMEM for NVIDIA GPU clusters

K Hamidouche, A Venkatesh, AA Awan… - 2015 IEEE …, 2015 -
GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA
networks like InfiniBand (IB) to GPUs (referred to as" Device"). It enables IB network …

Scalable communication architecture for network-attached accelerators

S Neuwirth, D Frey, M Nuessle… - 2015 IEEE 21st …, 2015 -
On the road to Exascale computing, novel communication architectures are required to
overcome the limitations of host-centric accelerators. Typically, accelerator devices require a …

Exploring data migration for future deep-memory many-core systems

S Perarnau, JA Zounmevo, B Gerofi… - 2016 IEEE …, 2016 -
Upcoming high-performance computing (HPC) platforms will have more complex memory
hierarchies with high-bandwidth on-package memory and in the future also non-volatile …