Trained transformers learn linear models in-context R Zhang, S Frei, PL Bartlett Journal of Machine Learning Research 25 (49), 1-55, 2024 | 97 | 2024 |
Off-policy fitted q-evaluation with differentiable function approximators: Z-estimation and inference theory R Zhang, X Zhang, C Ni, M Wang International Conference on Machine Learning, 26713-26749, 2022 | 18 | 2022 |
Negative preference optimization: From catastrophic collapse to effective unlearning R Zhang, L Lin, Y Bai, S Mei arXiv preprint arXiv:2404.05868, 2024 | 14 | 2024 |
AutoPRM: Automating Procedural Supervision for Multi-Step Reasoning via Controllable Question Decomposition Z Chen, Z Zhao, Z Zhu, R Zhang, X Li, B Raj, H Yao NAACL 2024, 2024 | 5 | 2024 |
Optimal estimation of policy gradient via double fitted iteration C Ni, R Zhang, X Ji, X Zhang, M Wang International Conference on Machine Learning, 16724-16783, 2022 | 5* | 2022 |
Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data R Zhang, A Zanette Advances in Neural Information Processing Systems, 2024, 2023 | 4 | 2023 |
In-context learning of a linear Transformer block: benefits of the MLP component and one-step GD initialization R Zhang, J Wu, PL Bartlett arXiv preprint arXiv:2402.14951, 2024 | 3 | 2024 |
Is Offline Decision Making Possible with Only Few Samples? Reliable Decisions in Data-Starved Bandits via Trust Region Enhancement R Zhang, Y Zhai, A Zanette arXiv preprint arXiv:2402.15703, 2024 | | 2024 |
Accelerating Best-of-N via Speculative Rejection R Zhang, M Haider, M Yin, J Qiu, M Wang, P Bartlett, A Zanette 2nd Workshop on Advancing Neural Network Training: Computational Efficiency …, 0 | | |