Sparse, dense, and attentional representations for text retrieval
Dual encoders perform retrieval by encoding documents and queries into dense low-
dimensional vectors, scoring each document by its inner product with the query. We …
dimensional vectors, scoring each document by its inner product with the query. We …
Sparser johnson-lindenstrauss transforms
We give two different and simple constructions for dimensionality reduction in ℓ 2 via linear
mappings that are sparse: only an O (ε)-fraction of entries in each column of our embedding …
mappings that are sparse: only an O (ε)-fraction of entries in each column of our embedding …
Optimality of the Johnson-Lindenstrauss lemma
For any d, n≥ 2 and 1/(min {n, d}) 0.4999<; ε<; 1, we show the existence of a set of n vectors
X⊂ ℝ d such that any embedding f: X→ ℝ m satisfying∀ x, y∈ X,(1-ε)∥ xy∥ 2 2≤∥ f (x)-f …
X⊂ ℝ d such that any embedding f: X→ ℝ m satisfying∀ x, y∈ X,(1-ε)∥ xy∥ 2 2≤∥ f (x)-f …
Optimal approximate matrix product in terms of stable rank
MB Cohen, J Nelson, DP Woodruff - arXiv preprint arXiv:1507.02268, 2015 - arxiv.org
We prove, using the subspace embedding guarantee in a black box way, that one can
achieve the spectral norm guarantee for approximate matrix multiplication with a …
achieve the spectral norm guarantee for approximate matrix multiplication with a …
Oblivious dimension reduction for k-means: beyond subspaces and the Johnson-Lindenstrauss lemma
We show that for n points in d-dimensional Euclidean space, a data oblivious random
projection of the columns onto m∈ O ((log k+ loglog n) ε− 6log1/ε) dimensions is sufficient to …
projection of the columns onto m∈ O ((log k+ loglog n) ε− 6log1/ε) dimensions is sufficient to …
Coresets-methods and history: A theoreticians design pattern for approximation and streaming algorithms
A Munteanu, C Schwiegelshohn - KI-Künstliche Intelligenz, 2018 - Springer
We present a technical survey on the state of the art approaches in data reduction and the
coreset framework. These include geometric decompositions, gradient methods, random …
coreset framework. These include geometric decompositions, gradient methods, random …
Principal component analysis and higher correlations for distributed data
R Kannan, S Vempala… - Conference on Learning …, 2014 - proceedings.mlr.press
We consider algorithmic problems in the setting in which the input data has been partitioned
arbitrarily on many servers. The goal is to compute a function of all the data, and the …
arbitrarily on many servers. The goal is to compute a function of all the data, and the …
BPTree: An ℓ2 Heavy Hitters Algorithm Using Constant Memory
The task of finding heavy hitters is one of the best known and well studied problems in the
area of data streams. One is given a list i 1, i 2,..., im∈[n] and the goal is to identify the items …
area of data streams. One is given a list i 1, i 2,..., im∈[n] and the goal is to identify the items …
Derandomizing logspace with a small shared hard drive
E Pyne - 39th Computational Complexity Conference (CCC …, 2024 - drops.dagstuhl.de
We obtain new catalytic algorithms for space-bounded derandomization. In the catalytic
computation model introduced by (Buhrman, Cleve, Koucký, Loff, and Speelman STOC …
computation model introduced by (Buhrman, Cleve, Koucký, Loff, and Speelman STOC …
Real-valued embeddings and sketches for fast distance and similarity estimation
DA Rachkovskij - Cybernetics and Systems Analysis, 2016 - Springer
This survey article considers methods and algorithms for fast estimation of data
distance/similarity measures from formed real-valued vectors of small dimension. The …
distance/similarity measures from formed real-valued vectors of small dimension. The …