Learn your reference model for real good alignment A Gorbatovski, B Shaposhnikov, A Malakhov, N Surnachev, Y Aksenov, ... arXiv preprint arXiv:2404.09656, 2024 | 11 | 2024 |
PALBERT: Teaching ALBERT to Ponder N Balagansky, D Gavrilov NeurIPS 2022 35, 14002--14012, 2022 | 6 | 2022 |
Diffusion Language Models Generation Can Be Halted Early SM Lo Cicero Vaina, N Balagansky, D Gavrilov arXiv e-prints, arXiv: 2305.10818, 2023 | 5* | 2023 |
Classifiers are better experts for controllable text generation A Sitdikov, N Balagansky, D Gavrilov, A Markov arXiv preprint arXiv:2205.07276, 2022 | 4 | 2022 |
Weight squeezing: Reparameterization for extreme compression and fast inference C Artem, G Daniil, B Nikita, K Pavel arXiv: 2010.06993, 2020 | 2 | 2020 |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Y Aksenov, N Balagansky, SMLC Vaina, B Shaposhnikov, A Gorbatovski, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 1 | 2024 |
Ahead-of-Time P-Tuning D Gavrilov, N Balagansky arXiv preprint arXiv:2305.10835, 2023 | 1 | 2023 |
Linear interpolation in parameter space is good enough for fine-tuned language models M Rofin, N Balagansky, D Gavrilov arXiv preprint arXiv:2211.12092, 2022 | 1 | 2022 |