Rlaif: Scaling reinforcement learning from human feedback with ai feedback H Lee, S Phatale, H Mansoor, K Lu, T Mesnard, C Bishop, V Carbune, ... arXiv preprint arXiv:2309.00267, 2023 | 247 | 2023 |
RLAIF: Scaling reinforcement learning from human feedback with ai feedback, 2024 H Lee, S Phatale, H Mansoor, T Mesnard, J Ferret, K Lu, C Bishop, E Hall, ... URL https://openreview. net/forum, 0 | 5 | |
Prose for a painting P Kashyap, S Phatale, I Drori arXiv preprint arXiv:1910.03634, 2019 | 3 | 2019 |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback H Sidahmed, S Phatale, A Hutcheson, Z Lin, Z Chen, Z Yu, J Jin, ... arXiv preprint arXiv:2403.10704, 2024 | 2 | 2024 |
Improve Mathematical Reasoning in Language Models by Automated Process Supervision L Luo, Y Liu, R Liu, S Phatale, H Lara, Y Li, L Shu, Y Zhu, L Meng, J Sun, ... arXiv preprint arXiv:2406.06592, 2024 | 1 | 2024 |
SAFE: Software-defined authentication framework AV Kamath, K Kataoka, N Vijayvergiya, GB Reddy, S Phatale Proceedings of the 12th Asian Internet Engineering Conference, 57-63, 2016 | 1 | 2016 |
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback H Lee, S Phatale, H Mansoor, T Mesnard, J Ferret, KR Lu, C Bishop, ... Forty-first International Conference on Machine Learning, 0 | 1 | |
Conversational Recommendation as Retrieval: A Simple, Strong Baseline R Gupta, R Aksitov, S Phatale, S Chaudhary, H Lee, A Rastogi arXiv preprint arXiv:2305.13725, 2023 | | 2023 |