H2o: Heavy-hitter oracle for efficient generative inference of large language models
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
are notably cost-prohibitive to deploy, particularly for applications involving long-content …
Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances
In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …
linear programs with two-sided constraints. In the special case of the minimum cost flow …
A coded compressed sensing scheme for unsourced multiple access
VK Amalladinne, JF Chamberland… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This article introduces a novel scheme, termed coded compressed sensing, for unsourced
multiple-access communication. The proposed divide-and-conquer approach leverages …
multiple-access communication. The proposed divide-and-conquer approach leverages …
Bipartite matching in nearly-linear time on moderately dense graphs
We present an ̃O(m+n^1.5)-time randomized algorithm for maximum cardinality bipartite
matching and related problems (eg transshipment, negative-weight shortest paths, and …
matching and related problems (eg transshipment, negative-weight shortest paths, and …
Hyperattention: Long-context attention in near-linear time
We present an approximate attention mechanism named HyperAttention to address the
computational challenges posed by the growing complexity of long contexts used in Large …
computational challenges posed by the growing complexity of long contexts used in Large …
Faster dynamic matrix inverse for faster lps
Motivated by recent Linear Programming solvers, we design dynamic data structures for
maintaining the inverse of an $ n\times n $ real matrix under $\textit {low-rank} $ updates …
maintaining the inverse of an $ n\times n $ real matrix under $\textit {low-rank} $ updates …
Heavy hitters and the structure of local privacy
We present a new locally differentially private algorithm for the heavy hitters problem that
achieves optimal worst-case error as a function of all standardly considered parameters …
achieves optimal worst-case error as a function of all standardly considered parameters …
Solving tall dense linear programs in nearly linear time
In this paper we provide an O (nd+ d 3) time randomized algorithm for solving linear
programs with d variables and n constraints with high probability. To obtain this result we …
programs with d variables and n constraints with high probability. To obtain this result we …
A faster algorithm for solving general lps
The fastest known LP solver for general (dense) linear programs is due to [Cohen, Lee and
Song'19] and runs in O*(n ω+ n 2.5− α/2+ n 2+ 1/6) time. A number of follow-up works [Lee …
Song'19] and runs in O*(n ω+ n 2.5− α/2+ n 2+ 1/6) time. A number of follow-up works [Lee …
Relative error tensor low rank approximation
We consider relative error low rank approximation of tensors with respect to the Frobenius
norm. Namely, given an order-q tensor A∊ ℝ∏ i= 1 q ni, output a rank-k tensor B for which …
norm. Namely, given an order-q tensor A∊ ℝ∏ i= 1 q ni, output a rank-k tensor B for which …