Deja vu: Contextual sparsity for efficient llms at inference time

Z Liu, J Wang, T Dao, T Zhou, B Yuan… - International …, 2023 - proceedings.mlr.press
Large language models (LLMs) with hundreds of billions of parameters have sparked a new
wave of exciting AI applications. However, they are computationally expensive at inference …

Mongoose: A learnable lsh framework for efficient neural network training

B Chen, Z Liu, B Peng, Z Xu, JL Li, T Dao… - International …, 2020 - openreview.net
Recent advances by practitioners in the deep learning community have breathed new life
into Locality Sensitive Hashing (LSH), using it to reduce memory and time bottlenecks in …

Norm adjusted proximity graph for fast inner product retrieval

S Tan, Z Xu, W Zhao, H Fei, Z Zhou, P Li - Proceedings of the 27th ACM …, 2021 - dl.acm.org
Efficient inner product search on embedding vectors is often the vital stage for online ranking
services, such as recommendation and information retrieval. Recommendation algorithms …

Reverse maximum inner product search: Formulation, algorithms, and analysis

D Amagata, T Hara - ACM Transactions on the Web, 2023 - dl.acm.org
The maximum inner product search (MIPS), which finds the item with the highest inner
product with a given query user, is an essential problem in the recommendation field …

[PDF][PDF] Recent Advances in Scalable Retrieval of Personalized Recommendations

DD Le, HW Lauw - 2019 - researchgate.net
Top-K recommendation seeks to deliver a personalized recommendation list of K items to a
user. The dual objectives are (1) accuracy in identifying the items a user is likely to prefer …