Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics
This paper introduces a new approach to building distributed-memory graph analytics
systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies …
systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies …
Cusp: A customizable streaming edge partitioner for distributed graph analytics
Graph analytics systems must analyze graphs with billions of vertices and edges which
require several terabytes of storage. Distributed-memory clusters are often used for …
require several terabytes of storage. Distributed-memory clusters are often used for …
Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics
Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like
D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model …
D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model …
A study of partitioning policies for graph analytics on large-scale distributed platforms
Distributed-memory clusters are used for in-memory processing of very large graphs with
billions of nodes and edges. This requires partitioning the graph among the machines in the …
billions of nodes and edges. This requires partitioning the graph among the machines in the …
Abelian: A compiler for graph analytics on distributed, heterogeneous platforms
The trend towards processor heterogeneity and distributed-memory has significantly
increased the complexity of parallel programming. In addition, the mix of applications that …
increased the complexity of parallel programming. In addition, the mix of applications that …
TopoX: Topology refactorization for minimizing network communication in graph computations
Efficient graph partitioning is vital for high-performance graph-parallel systems. Traditional
graph partitioning methods attempt to both minimize communication cost and guarantee …
graph partitioning methods attempt to both minimize communication cost and guarantee …
Callback-based completion notification using MPI Continuations
J Schuchart, P Samfass, C Niethammer, J Gracia… - Parallel Computing, 2021 - Elsevier
Asynchronous programming models (APM) are gaining more and more traction, allowing
applications to expose the available concurrency to a runtime system tasked with …
applications to expose the available concurrency to a runtime system tasked with …
[PDF][PDF] Efficient distribution for deep learning on large graphs
Graph neural networks (GNN) are compute intensive; thus, they are attractive for
acceleration on distributed platforms. We present DeepGalois, an efficient GNN framework …
acceleration on distributed platforms. We present DeepGalois, an efficient GNN framework …
Design and Analysis of the Network Software Stack of an Asynchronous Many-task System--The LCI parcelport of HPX
The HPX asynchronous many-task runtime system has been using TCP and MPI as its
communication backends (parcelports). We developed a new HPX parcelport using a new …
communication backends (parcelports). We developed a new HPX parcelport using a new …
Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine
There is a growing interest in Asynchronous Many-Task (AMT) runtimes as an efficient way
to map irregular and dynamic parallel applications onto heterogeneous computing …
to map irregular and dynamic parallel applications onto heterogeneous computing …