Sparse, dense, and attentional representations for text retrieval
Dual encoders perform retrieval by encoding documents and queries into dense low-
dimensional vectors, scoring each document by its inner product with the query. We …
dimensional vectors, scoring each document by its inner product with the query. We …
The specious art of single-cell genomics
Dimensionality reduction is standard practice for filtering noise and identifying relevant
features in large-scale data analyses. In biology, single-cell genomics studies typically begin …
features in large-scale data analyses. In biology, single-cell genomics studies typically begin …
A Nearly-Optimal Bound for Fast Regression with Guarantee
Given a matrix $ A\in\mathbb {R}^{n\times d} $ and a vector $ b\in\mathbb {R}^ n $, we
consider the regression problem with $\ell_\infty $ guarantees: finding a vector …
consider the regression problem with $\ell_\infty $ guarantees: finding a vector …
A new coreset framework for clustering
V Cohen-Addad, D Saulpic… - Proceedings of the 53rd …, 2021 - dl.acm.org
Given a metric space, the (k, z)-clustering problem consists of finding k centers such that the
sum of the of distances raised to the power z of every point to its closest center is minimized …
sum of the of distances raised to the power z of every point to its closest center is minimized …
Performance of Johnson-Lindenstrauss transform for k-means and k-medians clustering
K Makarychev, Y Makarychev… - Proceedings of the 51st …, 2019 - dl.acm.org
Consider an instance of Euclidean k-means or k-medians clustering. We show that the cost
of the optimal solution is preserved up to a factor of (1+ ε) under a projection onto a random …
of the optimal solution is preserved up to a factor of (1+ ε) under a projection onto a random …
Towards optimal lower bounds for k-median and k-means coresets
V Cohen-Addad, KG Larsen, D Saulpic… - Proceedings of the 54th …, 2022 - dl.acm.org
The (k, z)-clustering problem consists of finding a set of k points called centers, such that the
sum of distances raised to the power of z of every data point to its closest center is …
sum of distances raised to the power of z of every data point to its closest center is …
Neural ODE control for classification, approximation, and transport
D Ruiz-Balet, E Zuazua - SIAM Review, 2023 - SIAM
We analyze neural ordinary differential equations (NODEs) from a control theoretical
perspective to address some of the main properties and paradigms of deep learning (DL), in …
perspective to address some of the main properties and paradigms of deep learning (DL), in …
t-SNE-CUDA: GPU-Accelerated t-SNE and its Applications to Modern Data
Modern datasets and models are notoriously difficult to explore and analyze due to their
inherent high dimensionality and massive numbers of samples. Existing visualization …
inherent high dimensionality and massive numbers of samples. Existing visualization …
Training (overparametrized) neural networks in near-linear time
The slow convergence rate and pathological curvature issues of first-order gradient methods
for training deep neural networks, initiated an ongoing effort for developing faster $\mathit …
for training deep neural networks, initiated an ongoing effort for developing faster $\mathit …
GPU accelerated t-distributed stochastic neighbor embedding
Modern datasets and models are notoriously difficult to explore and analyze due to their
inherent high dimensionality and massive numbers of samples. Existing visualization …
inherent high dimensionality and massive numbers of samples. Existing visualization …