H2o: Heavy-hitter oracle for efficient generative inference of large language models

Z Zhang, Y Sheng, T Zhou, T Chen… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Large Language Models (LLMs), despite their recent impressive accomplishments,
are notably cost-prohibitive to deploy, particularly for applications involving long-content …

Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances

J Van Den Brand, YT Lee, YP Liu, T Saranurak… - Proceedings of the 53rd …, 2021 - dl.acm.org
In this paper we provide new randomized algorithms with improved runtimes for solving
linear programs with two-sided constraints. In the special case of the minimum cost flow …

A coded compressed sensing scheme for unsourced multiple access

VK Amalladinne, JF Chamberland… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
This article introduces a novel scheme, termed coded compressed sensing, for unsourced
multiple-access communication. The proposed divide-and-conquer approach leverages …

Bipartite matching in nearly-linear time on moderately dense graphs

J van den Brand, YT Lee, D Nanongkai… - 2020 IEEE 61st …, 2020 - ieeexplore.ieee.org
We present an ̃O(m+n^1.5)-time randomized algorithm for maximum cardinality bipartite
matching and related problems (eg transshipment, negative-weight shortest paths, and …

Hyperattention: Long-context attention in near-linear time

I Han, R Jayaram, A Karbasi, V Mirrokni… - arXiv preprint arXiv …, 2023 - arxiv.org
We present an approximate attention mechanism named HyperAttention to address the
computational challenges posed by the growing complexity of long contexts used in Large …

Faster dynamic matrix inverse for faster lps

S Jiang, Z Song, O Weinstein, H Zhang - arXiv preprint arXiv:2004.07470, 2020 - arxiv.org
Motivated by recent Linear Programming solvers, we design dynamic data structures for
maintaining the inverse of an $ n\times n $ real matrix under $\textit {low-rank} $ updates …

Heavy hitters and the structure of local privacy

M Bun, J Nelson, U Stemmer - ACM Transactions on Algorithms (TALG), 2019 - dl.acm.org
We present a new locally differentially private algorithm for the heavy hitters problem that
achieves optimal worst-case error as a function of all standardly considered parameters …

Solving tall dense linear programs in nearly linear time

J van den Brand, YT Lee, A Sidford… - Proceedings of the 52nd …, 2020 - dl.acm.org
In this paper we provide an O (nd+ d 3) time randomized algorithm for solving linear
programs with d variables and n constraints with high probability. To obtain this result we …

A faster algorithm for solving general lps

S Jiang, Z Song, O Weinstein, H Zhang - Proceedings of the 53rd Annual …, 2021 - dl.acm.org
The fastest known LP solver for general (dense) linear programs is due to [Cohen, Lee and
Song'19] and runs in O*(n ω+ n 2.5− α/2+ n 2+ 1/6) time. A number of follow-up works [Lee …

Relative error tensor low rank approximation

Z Song, DP Woodruff, P Zhong - Proceedings of the Thirtieth Annual ACM …, 2019 - SIAM
We consider relative error low rank approximation of tensors with respect to the Frobenius
norm. Namely, given an order-q tensor A∊ ℝ∏ i= 1 q ni, output a rank-k tensor B for which …