作者
Li Liu, Eric Li, Yimin Zhang, Zhizhong Tang
发表日期
2007/9/23
图书
Proceedings of the 33rd international conference on Very large data bases
页码范围
1275-1285
简介
Multi-core processors are proliferated across different domains in recent years. In this paper, we study the performance of frequent pattern mining on a modern multi-core machine. A detailed study shows that, even with the best implementation, current FP-tree based algorithms still under-utilize a multi-core system due to poor data locality and insufficient parallelism expression. We propose two techniques: a cache-conscious FP-array (frequent pattern array) and a lock-free dataset tiling parallelization mechanism to address this problem. The FP-array efficiently improves the data locality performance, and makes use of the benefits from hardware and software prefetching. The result yields an overall 4.0 speedup compared with the state-of-the-art implementation. Furthermore, to unlock the power of multi-core processor, a lockfree parallelization approach is proposed to restructure the FP-tree building algorithm. It not only eliminates the locks in building a single FP-tree with fine-grained threads, but also improves the temporal data locality performance. To summarize, with the proposed cache-conscious FP-array and lock-free parallelization enhancements, the overall FP-tree algorithm achieves a 24 fold speedup on an 8-core machine. Finally, we believe the presented techniques can be applied to other data mining tasks as well with the prevalence of multi-core processor.
引用总数
200820092010201120122013201420152016201720182019202020212022202320242107171321152016129962512
学术搜索中的文章
L Liu, E Li, Y Zhang, Z Tang - Proceedings of the 33rd international conference on …, 2007