Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform
Y Umuroglu, D Morrison, M Jahre - 2015 25th International …, 2015 - ieeexplore.ieee.org
Y Umuroglu, D Morrison, M Jahre
2015 25th International Conference on Field Programmable Logic and …, 2015•ieeexplore.ieee.orgLarge and sparse small-world graphs are ubiquitous across many scientific domains from
bioinformatics to computer science. As these graphs grow in scale, traversal algorithms such
as breadth-first search (BFS), fundamental to many graph processing applications and
metrics, become more costly to compute. The cause is attributed to poor temporal and
spatial locality due to the inherently irregular memory access patterns of these algorithms. A
large body of research has targeted accelerating and parallelizing BFS on a variety of …
bioinformatics to computer science. As these graphs grow in scale, traversal algorithms such
as breadth-first search (BFS), fundamental to many graph processing applications and
metrics, become more costly to compute. The cause is attributed to poor temporal and
spatial locality due to the inherently irregular memory access patterns of these algorithms. A
large body of research has targeted accelerating and parallelizing BFS on a variety of …
Large and sparse small-world graphs are ubiquitous across many scientific domains from bioinformatics to computer science. As these graphs grow in scale, traversal algorithms such as breadth-first search (BFS), fundamental to many graph processing applications and metrics, become more costly to compute. The cause is attributed to poor temporal and spatial locality due to the inherently irregular memory access patterns of these algorithms. A large body of research has targeted accelerating and parallelizing BFS on a variety of computing platforms, including hybrid CPU-GPU approaches for exploiting the small-world property. In the same spirit, we show how a single-die FPGA-CPU heterogeneous device can be used to leverage the varying degree of parallelism in small-world graphs. Additionally, we demonstrate how dense rather than sparse treatment of the BFS frontier vector yields simpler memory access patterns for BFS, trading redundant computation for DRAM bandwidth utilization and faster graph exploration. On a range of synthetic small-world graphs, our hybrid approach performs 7.8× better than software-only and 2× better than accelerator-only implementations. We achieve an average traversal speed of 172 MTEPS (millions of traversed edges per second) on the ZedBoard platform, which is more than twice as effective as the best previously published FPGA BFS implementation in terms of traversals per bandwidth.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果