A survey on locality sensitive hashing algorithms and their applications

O Jafari, P Maurya, P Nagarkar, KM Islam… - arXiv preprint arXiv …, 2021 - arxiv.org
Finding nearest neighbors in high-dimensional spaces is a fundamental operation in many
diverse application domains. Locality Sensitive Hashing (LSH) is one of the most popular …

Approximate Nearest Neighbor Search in High Dimensional Vector Databases: Current Research and Future Directions.

Y Tian, Z Yue, R Zhang, X Zhao, B Zheng… - IEEE Data Eng …, 2023 - sites.computer.org
Approximate nearest neighbor search is an important research topic with a wide range of
applications. In this study, we first introduce the problem and review major research results …

DB-LSH 2.0: Locality-sensitive hashing with query-based dynamic bucketing

Y Tian, X Zhao, X Zhou - IEEE Transactions on Knowledge and …, 2023 - ieeexplore.ieee.org
Locality-sensitive hashing (LSH) is a promising family of methods for the high-dimensional
approximate nearest neighbor (ANN) search problem due to its sub-linear query time and …

LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data

L Patel, S Jha, C Guestrin, M Zaharia - arXiv preprint arXiv:2407.11418, 2024 - arxiv.org
The semantic capabilities of language models (LMs) have the potential to enable rich
analytics and reasoning over vast knowledge corpora. Unfortunately, existing systems lack …

Efficient approximate nearest neighbor search in multi-dimensional databases

Y Peng, B Choi, TN Chan, J Yang, J Xu - … of the ACM on Management of …, 2023 - dl.acm.org
Approximate nearest neighbor (ANN) search is a fundamental search in multi-dimensional
databases, which has numerous real-world applications, such as image retrieval …

Dumpy: A compact and adaptive index for large data series collections

Z Wang, Q Wang, P Wang, T Palpanas… - Proceedings of the ACM …, 2023 - dl.acm.org
Data series indexes are necessary for managing and analyzing the increasing amounts of
data series collections that are nowadays available. These indexes support both exact and …

FARGO: Fast maximum inner product search via global multi-probing

X Zhao, B Zheng, X Yi, X Luan, C Xie, X Zhou… - Proceedings of the …, 2023 - dl.acm.org
Maximum inner product search (MIPS) in high-dimensional spaces has wide applications
but is computationally expensive due to the curse of dimensionality. Existing studies employ …

ACORN: Performant and Predicate-Agnostic Search Over Vector Embeddings and Structured Data

L Patel, P Kraft, C Guestrin, M Zaharia - … of the ACM on Management of …, 2024 - dl.acm.org
Applications increasingly leverage mixed-modality data, and must jointly search over vector
data, such as embedded images, text and video, as well as structured data, such as …

A New Sparse Data Clustering Method Based On Frequent Items

Q Huang, P Luo, AKH Tung - Proceedings of the ACM on Management …, 2023 - dl.acm.org
Large, sparse categorical data is a natural way to represent complex data like sequences,
trees, and graphs. Such data is prevalent in many applications, eg, Criteo released a …

HJG: An Effective Hierarchical Joint Graph for ANNS in Multi-Metric Spaces

Y Zhu, L Chen, Y Gao, R Ma, B Zheng… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Owing to the widespread deployment of smartphones and networked devices, massive
amount of data in different types are generated every day, including numeric data, locations …