KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework

B Goglin, S Moreaud - Journal of Parallel and Distributed Computing, 2013 - Elsevier
The multiplication of cores in today's architectures raises the importance of intra-node
communication in modern clusters and their impact on the overall parallel application …

Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus

T Ma, G Bosilca, A Bouteiller, B Goglin… - 2011 International …, 2011 - ieeexplore.ieee.org
Shared memory is among the most common approaches to implementing message passing
within multicorenodes. However, current shared memory techniques donot scale with …

Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)

B Goglin - 2014 International Conference on High Performance …, 2014 - ieeexplore.ieee.org
Modern computing platforms are increasingly complex, with multiple cores, shared caches,
and NUMA architectures. Parallel applications developers have to take locality into account …

Process distance-aware adaptive MPI collective communications

T Ma, T Herault, G Bosilca… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
Message Passing Interface (MPI) implementations provide a great flexibility to allow users to
arbitrarily bind processes to computing cores to fully exploit clusters of multicore/many-core …

Guide to the software engineering body of knowledge (swebok) and the software engineering education knowledge (seek)-a preliminary mapping

P Bourque, JM Lavoie, A Lee, S Trudel… - … Workshop on Software …, 2002 - computer.org
Multi-dimensional MPI communications, where MPI communications have to be performed
in each dimension of a Cartesian communicator, have been frequently used in many of …

XCluster synopses for structured XML content

N Polyzotis, M Garofalakis - 22nd International Conference on …, 2006 - ieeexplore.ieee.org
We tackle the difficult problem of summarizing the path/branching structure and value
content of an XML database that comprises both numeric and textual values. We introduce a …

Using node information to implement MPI Cartesian topologies

WD Gropp - Proceedings of the 25th European MPI Users' Group …, 2018 - dl.acm.org
The MPI API provides support for Cartesian process topologies, including the option to
reorder the processes to achieve better communication performance. But MPI …

Using node and socket information to implement MPI Cartesian topologies

WD Gropp - Parallel Computing, 2019 - Elsevier
The MPI API provides support for Cartesian process topologies, including the option to
reorder the processes to achieve better communication performance. But MPI …

Turbostream: Towards low-latency data stream processing

S Wu, M Liu, S Ibrahim, H Jin, L Gu… - 2018 IEEE 38th …, 2018 - ieeexplore.ieee.org
Data Stream Processing (DSP) applications are often modelled as a directed acyclic graph:
operators with data streams among them. Inter-operator communications can have a …

Cooperative rendezvous protocols for improved performance and overlap

S Chakraborty, M Bayatpour, J Hashmi… - … Conference for High …, 2018 - ieeexplore.ieee.org
With the emergence of larger multi-/many-core clusters and new areas of HPC applications,
performance of large message communication is becoming more important. MPI libraries …