Powerlyra: Differentiated graph computation and partitioning on skewed graphs

R Chen, J Shi, Y Chen, B Zang, H Guan… - ACM Transactions on …, 2019 - dl.acm.org
R Chen, J Shi, Y Chen, B Zang, H Guan, H Chen
ACM Transactions on Parallel Computing (TOPC), 2019dl.acm.org
Natural graphs with skewed distributions raise unique challenges to distributed graph
computation and partitioning. Existing graph-parallel systems usually use a “one-size-fits-all”
design that uniformly processes all vertices, which either suffer from notable load imbalance
and high contention for high-degree vertices (eg, Pregel and GraphLab) or incur high
communication cost and memory consumption even for low-degree vertices (eg,
PowerGraph and GraphX). In this article, we argue that skewed distributions in natural …
Natural graphs with skewed distributions raise unique challenges to distributed graph computation and partitioning. Existing graph-parallel systems usually use a “one-size-fits-all” design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab) or incur high communication cost and memory consumption even for low-degree vertices (e.g., PowerGraph and GraphX). In this article, we argue that skewed distributions in natural graphs also necessitate differentiated processing on high-degree and low-degree vertices. We then introduce PowerLyra, a new distributed graph processing system that embraces the best of both worlds of existing graph-parallel systems. Specifically, PowerLyra uses centralized computation for low-degree vertices to avoid frequent communications and distributes the computation for high-degree vertices to balance workloads. PowerLyra further provides an efficient hybrid graph partitioning algorithm (i.e., hybrid-cut) that combines edge-cut (for low-degree vertices) and vertex-cut (for high-degree vertices) with heuristics. To improve cache locality of inter-node graph accesses, PowerLyra further provides a locality-conscious data layout optimization. PowerLyra is implemented based on the latest GraphLab and can seamlessly support various graph algorithms running in both synchronous and asynchronous execution modes. A detailed evaluation on three clusters using various graph-analytics and MLDM (Machine Learning and Data Mining) applications shows that PowerLyra outperforms PowerGraph by up to 5.53X (from 1.24X) and 3.26X (from 1.49X) for real-world and synthetic graphs, respectively, and is much faster than other systems like GraphX and Giraph, yet with much less memory consumption. A porting of hybrid-cut to GraphX further confirms the efficiency and generality of PowerLyra.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果