Losha: A general framework for scalable locality sensitive hashing

J Li, J Cheng, F Yang, Y Huang, Y Zhao, X Yan… - Proceedings of the 40th …, 2017 - dl.acm.org
J Li, J Cheng, F Yang, Y Huang, Y Zhao, X Yan, R Zhao
Proceedings of the 40th International ACM SIGIR Conference on Research and …, 2017dl.acm.org
Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in
high dimensional space for approximate nearest neighbor search. As the volume of real-
world datasets keeps growing, it has become necessary to develop distributed LSH
solutions. Implementing a distributed LSH algorithm from scratch requires high development
costs, thus most existing solutions are developed on general-purpose platforms such as
Hadoop and Spark. However, we argue that these platforms are both hard to use for …
Locality Sensitive Hashing (LSH) algorithms are widely adopted to index similar items in high dimensional space for approximate nearest neighbor search. As the volume of real-world datasets keeps growing, it has become necessary to develop distributed LSH solutions. Implementing a distributed LSH algorithm from scratch requires high development costs, thus most existing solutions are developed on general-purpose platforms such as Hadoop and Spark. However, we argue that these platforms are both hard to use for programming LSH algorithms and inefficient for LSH computation. We propose LoSHa, a distributed computing framework that reduces the development cost by designing a tailor-made, general programming interface and achieves high efficiency by exploring LSH-specific system implementation and optimizations. We show that many LSH algorithms can be easily expressed in LoSHa's API. We evaluate LoSHa and also compare with general-purpose platforms on the same LSH algorithms. Our results show that LoSHa's performance can be an order of magnitude faster, while the implementations on LoSHa are even more intuitive and require few lines of code.
ACM Digital Library
以上显示的是最相近的搜索结果。 查看全部搜索结果