Dualsim: Parallel subgraph enumeration in a massive graph on a single machine

H Kim, J Lee, SS Bhowmick, WS Han, JH Lee… - Proceedings of the …, 2016 - dl.acm.org
Proceedings of the 2016 International Conference on Management of Data, 2016dl.acm.org
Subgraph enumeration is important for many applications such as subgraph frequencies,
network motif discovery, graphlet kernel computation, and studying the evolution of social
networks. Most earlier work on subgraph enumeration assumes that graphs are resident in
memory, which results in serious scalability problems. Recently, efforts to enumerate all
subgraphs in a large-scale graph have seemed to enjoy some success by partitioning the
data graph and exploiting the distributed frameworks such as MapReduce and distributed …
Subgraph enumeration is important for many applications such as subgraph frequencies, network motif discovery, graphlet kernel computation, and studying the evolution of social networks. Most earlier work on subgraph enumeration assumes that graphs are resident in memory, which results in serious scalability problems. Recently, efforts to enumerate all subgraphs in a large-scale graph have seemed to enjoy some success by partitioning the data graph and exploiting the distributed frameworks such as MapReduce and distributed graph engines. However, we notice that all existing distributed approaches have serious performance problems for subgraph enumeration due to the explosive number of partial results. In this paper, we design and implement a disk-based, single machine parallel subgraph enumeration solution called DualSim that can handle massive graphs without maintaining exponential numbers of partial results. Specifically, we propose a novel concept of the dual approach for subgraph enumeration. The dual approach swaps the roles of the data graph and the query graph. Specifically, instead of fixing the matching order in the query and then matching data vertices, it fixes the data vertices by fixing a set of disk pages and then finds all subgraph matchings in these pages. This enables us to significantly reduce the number of disk reads. We conduct extensive experiments with various real-world graphs to systematically demonstrate the superiority of DualSim over state-of-the-art distributed subgraph enumeration methods. DualSim outperforms the state-of-the-art methods by up to orders of magnitude, while they fail for many queries due to explosive intermediate results.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果