Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 924 | 2023 |
Poseidon: An efficient communication architecture for distributed deep learning on {GPU} clusters H Zhang, Z Zheng, S Xu, W Dai, Q Ho, X Liang, Z Hu, J Wei, P Xie, ... 2017 USENIX Annual Technical Conference (USENIX ATC 17), 181-193, 2017 | 410 | 2017 |
On learning intrinsic rewards for policy gradient methods Z Zheng, J Oh, S Singh Advances in Neural Information Processing Systems, 4644-4654, 2018 | 200 | 2018 |
Parallelizing sequential graph computations W Fan, J Xu, Y Wu, W Yu, J Jiang, Z Zheng, B Zhang, Y Cao, C Tian Proceedings of the 2017 ACM International Conference on Management of Data …, 2017 | 119 | 2017 |
What Can Learned Intrinsic Rewards Capture? Z Zheng, J Oh, M Hessel, Z Xu, M Kroiss, H Van Hasselt, D Silver, S Singh International Conference on Machine Learning, 11436-11446, 2020 | 90 | 2020 |
Automated multi-layer optical design via deep reinforcement learning H Wang, Z Zheng, C Ji, LJ Guo Machine Learning: Science and Technology 2 (2), 025013, 2021 | 60 | 2021 |
Understanding plasticity in neural networks C Lyle, Z Zheng, E Nikishin, BA Pires, R Pascanu, W Dabney International Conference on Machine Learning, 23190-23211, 2023 | 44 | 2023 |
Generalized Preference Optimization: A Unified Approach to Offline Alignment Y Tang, ZD Guo, Z Zheng, D Calandriello, R Munos, M Rowland, ... arXiv preprint arXiv:2402.05749, 2024 | 21 | 2024 |
Understanding the performance gap between online and offline alignment algorithms Y Tang, DZ Guo, Z Zheng, D Calandriello, Y Cao, E Tarassov, R Munos, ... arXiv preprint arXiv:2405.08448, 2024 | 13 | 2024 |
Disentangling the Causes of Plasticity Loss in Neural Networks C Lyle, Z Zheng, K Khetarpal, H van Hasselt, R Pascanu, J Martens, ... arXiv preprint arXiv:2402.18762, 2024 | 8 | 2024 |
Adaptive Pairwise Weights for Temporal Credit Assignment Z Zheng, R Vuorio, R Lewis, S Singh Proceedings of the AAAI Conference on Artificial Intelligence 36 (8), 9225-9232, 2022 | 7* | 2022 |
Learning State Representations from Random Deep Action-conditional Predictions Z Zheng, V Veeriah, R Vuorio, RL Lewis, S Singh Advances in Neural Information Processing Systems 34, 23679-23691, 2021 | 6 | 2021 |
Towards multi‐agent reinforcement learning‐driven over‐the‐counter market simulations N Vadori, L Ardon, S Ganesh, T Spooner, S Amrouni, J Vann, M Xu, ... Mathematical Finance 34 (2), 262-347, 2024 | 5 | 2024 |
GrASP: Gradient-Based Affordance Selection for Planning V Veeriah, Z Zheng, R Lewis, S Singh arXiv preprint arXiv:2202.04772, 2022 | 4 | 2022 |
Human Alignment of Large Language Models through Online Preference Optimisation D Calandriello, D Guo, R Munos, M Rowland, Y Tang, BA Pires, ... arXiv preprint arXiv:2403.08635, 2024 | 2 | 2024 |
Advances in Deep Reinforcement Learning: Intrinsic Rewards, Temporal Credit Assignment, State Representations, and Value-equivalent Models Z Zheng | | 2022 |
Reinforcement learning using meta-learned intrinsic rewards Z Zheng, J Oh, SS Baveja US Patent App. 17/033,410, 2021 | | 2021 |
Towards Perpetually Trainable Neural Networks C Lyle, Z Zheng, K Khetarpal, R Pascanu, J Martens, H van Hasselt, ... | | |
Supplementary Material: On Learning Intrinsic Rewards for Policy Gradient Methods Z Zheng, J Oh, S Singh | | |