Sparse, dense, and attentional representations for text retrieval

Y Luan, J Eisenstein, K Toutanova… - Transactions of the …, 2021 - direct.mit.edu
Dual encoders perform retrieval by encoding documents and queries into dense low-
dimensional vectors, scoring each document by its inner product with the query. We …

Sparser johnson-lindenstrauss transforms

DM Kane, J Nelson - Journal of the ACM (JACM), 2014 - dl.acm.org
We give two different and simple constructions for dimensionality reduction in ℓ 2 via linear
mappings that are sparse: only an O (ε)-fraction of entries in each column of our embedding …

Optimality of the Johnson-Lindenstrauss lemma

KG Larsen, J Nelson - 2017 IEEE 58th Annual Symposium on …, 2017 - ieeexplore.ieee.org
For any d, n≥ 2 and 1/(min {n, d}) 0.4999<; ε<; 1, we show the existence of a set of n vectors
X⊂ ℝ d such that any embedding f: X→ ℝ m satisfying∀ x, y∈ X,(1-ε)∥ xy∥ 2 2≤∥ f (x)-f …

Optimal approximate matrix product in terms of stable rank

MB Cohen, J Nelson, DP Woodruff - arXiv preprint arXiv:1507.02268, 2015 - arxiv.org
We prove, using the subspace embedding guarantee in a black box way, that one can
achieve the spectral norm guarantee for approximate matrix multiplication with a …

Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma

L Becchetti, M Bury, V Cohen-Addad… - Proceedings of the 51st …, 2019 - dl.acm.org
We show that for n points in d-dimensional Euclidean space, a data oblivious random
projection of the columns onto m∈ O ((log k+ loglog n) ε− 6log1/ε) dimensions is sufficient to …

Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms

A Munteanu, C Schwiegelshohn - KI-Künstliche Intelligenz, 2018 - Springer
We present a technical survey on the state of the art approaches in data reduction and the
coreset framework. These include geometric decompositions, gradient methods, random …

Principal component analysis and higher correlations for distributed data

R Kannan, S Vempala… - Conference on Learning …, 2014 - proceedings.mlr.press
We consider algorithmic problems in the setting in which the input data has been partitioned
arbitrarily on many servers. The goal is to compute a function of all the data, and the …

BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory

V Braverman, SR Chestnut, N Ivkin, J Nelson… - Proceedings of the 36th …, 2017 - dl.acm.org
The task of finding heavy hitters is one of the best known and well studied problems in the
area of data streams. One is given a list i 1, i 2,..., im∈[n] and the goal is to identify the items …

Derandomizing logspace with a small shared hard drive

E Pyne - 39th Computational Complexity Conference (CCC …, 2024 - drops.dagstuhl.de
We obtain new catalytic algorithms for space-bounded derandomization. In the catalytic
computation model introduced by (Buhrman, Cleve, Koucký, Loff, and Speelman STOC …

Real-valued embeddings and sketches for fast distance and similarity estimation

DA Rachkovskij - Cybernetics and Systems Analysis, 2016 - Springer
This survey article considers methods and algorithms for fast estimation of data
distance/similarity measures from formed real-valued vectors of small dimension. The …