Spqr: A sparse-quantized representation for near-lossless llm weight compression T Dettmers, R Svirschevski, V Egiazarian, D Kuznedelev, E Frantar, ... arXiv preprint arXiv:2306.03078, 2023 | 124 | 2023 |
Sequoia: Scalable, robust, and hardware-aware speculative decoding Z Chen, A May, R Svirschevski, Y Huang, M Ryabinin, Z Jia, B Chen arXiv preprint arXiv:2402.12374, 2024 | 10 | 2024 |
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices R Svirschevski, A May, Z Chen, B Chen, Z Jia, M Ryabinin arXiv preprint arXiv:2406.02532, 2024 | 1 | 2024 |
Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization V Egiazarian, D Kuznedelev, A Voronov, R Svirschevski, M Goin, ... arXiv preprint arXiv:2409.00492, 2024 | | 2024 |
Privacy Preserving API Fine-tuning for LLMs P Zmushko, M Mansurov, R Svirschevski, D Kuznedelev, M Ryabinin, ... | | |