Gluon: A communication-optimizing substrate for distributed heterogeneous graph analytics

R Dathathri, G Gill, L Hoang, HV Dang… - Proceedings of the 39th …, 2018 - dl.acm.org
This paper introduces a new approach to building distributed-memory graph analytics
systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies …

Cusp: A customizable streaming edge partitioner for distributed graph analytics

L Hoang, R Dathathri, G Gill, K Pingali - ACM SIGOPS Operating …, 2021 - dl.acm.org
Graph analytics systems must analyze graphs with billions of vertices and edges which
require several terabytes of storage. Distributed-memory clusters are often used for …

Gluon-async: A bulk-asynchronous system for distributed and heterogeneous graph analytics

R Dathathri, G Gill, L Hoang, V Jatala… - 2019 28th …, 2019 - ieeexplore.ieee.org
Distributed graph analytics systems for CPUs, like D-Galois and Gemini, and for GPUs, like
D-IrGL and Lux, use a bulk-synchronous parallel (BSP) programming and execution model …

A study of partitioning policies for graph analytics on large-scale distributed platforms

G Gill, R Dathathri, L Hoang, K Pingali - Proceedings of the VLDB …, 2018 - dl.acm.org
Distributed-memory clusters are used for in-memory processing of very large graphs with
billions of nodes and edges. This requires partitioning the graph among the machines in the …

Abelian: A compiler for graph analytics on distributed, heterogeneous platforms

G Gill, R Dathathri, L Hoang, A Lenharth… - European Conference on …, 2018 - Springer
The trend towards processor heterogeneity and distributed-memory has significantly
increased the complexity of parallel programming. In addition, the mix of applications that …

TopoX: Topology refactorization for minimizing network communication in graph computations

Y Zhang, H Wang, M Jia, J Wang, D Li… - IEEE/ACM …, 2020 - ieeexplore.ieee.org
Efficient graph partitioning is vital for high-performance graph-parallel systems. Traditional
graph partitioning methods attempt to both minimize communication cost and guarantee …

Callback-based completion notification using MPI Continuations

J Schuchart, P Samfass, C Niethammer, J Gracia… - Parallel Computing, 2021 - Elsevier
Asynchronous programming models (APM) are gaining more and more traction, allowing
applications to expose the available concurrency to a runtime system tasked with …

[PDF][PDF] Efficient distribution for deep learning on large graphs

L Hoang, X Chen, H Lee, R Dathathri, G Gill… - update, 2021 - chenxuhao.github.io
Graph neural networks (GNN) are compute intensive; thus, they are attractive for
acceleration on distributed platforms. We present DeepGalois, an efficient GNN framework …

Design and Analysis of the Network Software Stack of an Asynchronous Many-task System--The LCI parcelport of HPX

J Yan, H Kaiser, M Snir - Proceedings of the SC'23 Workshops of The …, 2023 - dl.acm.org
The HPX asynchronous many-task runtime system has been using TCP and MPI as its
communication backends (parcelports). We developed a new HPX parcelport using a new …

Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine

O Mor, G Bosilca, M Snir - … of the 52nd International Conference on …, 2023 - dl.acm.org
There is a growing interest in Asynchronous Many-Task (AMT) runtimes as an efficient way
to map irregular and dynamic parallel applications onto heterogeneous computing …