Efficient processing of very large graphs in a small cluster

D Yan, Y Huang, J Cheng, H Wu - arXiv preprint arXiv:1601.05590, 2016 - arxiv.org
D Yan, Y Huang, J Cheng, H Wu
arXiv preprint arXiv:1601.05590, 2016arxiv.org
Inspired by the success of Google's Pregel, many systems have been developed recently for
iterative computation over big graphs. These systems provide a user-friendly vertex-centric
programming interface, where a programmer only needs to specify the behavior of one
generic vertex when developing a parallel graph algorithm. However, most existing systems
require the input graph to reside in memories of the machines in a cluster, and the few out-of-
core systems suffer from problems such as poor efficiency for sparse computation workload …
Inspired by the success of Google's Pregel, many systems have been developed recently for iterative computation over big graphs. These systems provide a user-friendly vertex-centric programming interface, where a programmer only needs to specify the behavior of one generic vertex when developing a parallel graph algorithm. However, most existing systems require the input graph to reside in memories of the machines in a cluster, and the few out-of-core systems suffer from problems such as poor efficiency for sparse computation workload, high demand on network bandwidth, and expensive cost incurred by external-memory join and group-by. In this paper, we introduce the GraphD system for a user to process very large graphs with ordinary computing resources. GraphD fully overlaps computation with communication, by streaming edges and messages on local disks, while transmitting messages in parallel. For a broad class of Pregel algorithms where message combiner is applicable, GraphD eliminates the need of any expensive external-memory join or group-by. These key techniques allow GraphD to achieve comparable performance to in-memory Pregel-like systems without keeping edges and messages in memories. We prove that to process a graph G=(V, E) with n machines using GraphD, each machine only requires O(|V|/n) memory space, allowing GraphD to scale to very large graphs with a small cluster. Extensive experiments show that GraphD beats existing out-of-core systems by orders of magnitude, and achieves comparable performance to in-memory systems running with enough memories.
arxiv.org
以上显示的是最相近的搜索结果。 查看全部搜索结果