When can we learn general-sum Markov games with a large number of players sample-efficiently? Z Song, S Mei, Y Bai arXiv preprint arXiv:2110.04184, 2021 | 94 | 2021 |
Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent Y Bai, C Jin, S Mei, Z Song, T Yu Advances in Neural Information Processing Systems 35, 22313-22325, 2022 | 17 | 2022 |
Reward collapse in aligning large language models Z Song, T Cai, JD Lee, WJ Su arXiv preprint arXiv:2305.17608, 2023 | 16 | 2023 |
Sample-efficient learning of correlated equilibria in extensive-form games Z Song, S Mei, Y Bai Advances in Neural Information Processing Systems 35, 4099-4110, 2022 | 12 | 2022 |
Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings Z Song, T Cai, JD Lee, WJ Su ICML 2023 Workshop The Many Facets of Preference-Based Learning, 2023 | 2 | 2023 |