Histosketch: Fast similarity-preserving sketching of streaming histograms with concept drift- 学术资源搜索

Histosketch: Fast similarity-preserving sketching of streaming histograms with concept drift

D Yang, B Li, L Rettig… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

2017 IEEE International Conference on Data Mining (ICDM), 2017•ieeexplore.ieee.org

Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%).

ieeexplore.ieee.org

展开收起

被引用次数：54 相关文章所有 9 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果