Proximal policy optimization algorithms J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov arXiv preprint arXiv:1707.06347, 2017 | 19043 | 2017 |
Exploration by random network distillation Y Burda, H Edwards, A Storkey, O Klimov arXiv preprint arXiv:1810.12894, 2018 | 1385 | 2018 |
Openai baselines P Dhariwal, C Hesse, O Klimov, A Nichol, M Plappert, A Radford, ... | 1026 | 2017 |
Stable baselines A Hill, A Raffin, M Ernestus, A Gleave, A Kanervisto, R Traore, P Dhariwal, ... | 896 | 2018 |
Quantifying generalization in reinforcement learning K Cobbe, O Klimov, C Hesse, T Kim, J Schulman International conference on machine learning, 1282-1289, 2019 | 690 | 2019 |
Gotta learn fast: A new benchmark for generalization in rl A Nichol, V Pfau, C Hesse, O Klimov, J Schulman arXiv preprint arXiv:1804.03720, 2018 | 210 | 2018 |
Phasic policy gradient KW Cobbe, J Hilton, O Klimov, J Schulman International Conference on Machine Learning, 2020-2027, 2021 | 169 | 2021 |
Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017) J Schulman, F Wolski, P Dhariwal, A Radford, O Klimov arXiv preprint arXiv:1707.06347, 2017 | 144 | 2017 |
Openai baselines (2017) P Dhariwal, C Hesse, O Klimov, A Nichol, M Plappert, A Radford, ... URL https://github. com/openai/baselines, 2016 | 61 | 2016 |
Proximal policy optimization J Schulman, O Klimov, F Wolski, P Dhariwal, A Radford arXiv preprint arXiv:1707.06347, 2017 | 39 | 2017 |
Carracing-v0 O Klimov URL https://gym. openai. com/envs/CarRacing-v0, 2016 | 22 | 2016 |
Multi-task curriculum learning in a complex, visual, hard-exploration domain: Minecraft I Kanitscheider, J Huizinga, D Farhi, WH Guss, B Houghton, R Sampedro, ... arXiv preprint arXiv:2106.14876, 2021 | 18 | 2021 |
Carracing-v0. 2016 O Klimov URL https://gym. openai. com/envs/CarRacing-v0, 2016 | 10 | 2016 |