Histosketch: Fast similarity-preserving sketching of streaming histograms with concept drift
2017 IEEE International Conference on Data Mining (ICDM), 2017•ieeexplore.ieee.org
Histogram-based similarity has been widely adopted in many machine learning tasks.
However, measuring histogram similarity is a challenging task for streaming data, where the
elements of a histogram are observed in a streaming manner. First, the ever-growing
cardinality of histogram elements makes any similarity computation inefficient. Second, the
concept-drift issue in the data streams also impairs the accurate assessment of the similarity.
In this paper, we propose to overcome the above challenges with HistoSketch, a fast …
However, measuring histogram similarity is a challenging task for streaming data, where the
elements of a histogram are observed in a streaming manner. First, the ever-growing
cardinality of histogram elements makes any similarity computation inefficient. Second, the
concept-drift issue in the data streams also impairs the accurate assessment of the similarity.
In this paper, we propose to overcome the above challenges with HistoSketch, a fast …
Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%).
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果