Gradient-free online learning in continuous games with delayed rewards

AJ Chan, H Sun, S Holt, M van der Schaar - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) has been credited as the key
advance that has allowed Large Language Models (LLMs) to effectively follow instructions …

被引用次数：26 相关文章所有 3 个版本

[PDF] neurips.cc

Asynchronous proportional response dynamics: convergence in markets with adversarial scheduling

Y Kolumbus, M Levy, N Nisan - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract We study Proportional Response Dynamics (PRD) in linear Fisher markets, where
participants act asynchronously. We model this scenario as a sequential process in which at …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Doubly optimal no-regret online learning in strongly monotone games with bandit feedback

W Ba, T Lin, J Zhang, Z Zhou - arXiv preprint arXiv:2112.02856, 2021 - arxiv.org

We consider online no-regret learning in unknown games with bandit feedback, where each
player can only observe its reward at each time--determined by all players' current joint …

被引用次数：28 相关文章所有 6 个版本

[PDF] mlr.press

Off-policy reinforcement learning with delayed rewards

B Han, Z Ren, Z Wu, Y Zhou… - … Conference on Machine …, 2022 - proceedings.mlr.press

We study deep reinforcement learning (RL) algorithms with delayed rewards. In many real-
world tasks, instant rewards are often not readily accessible or even defined immediately …

被引用次数：37 相关文章所有 7 个版本

[PDF] arxiv.org

Learning long-term reward redistribution via randomized return decomposition

Z Ren, R Guo, Y Zhou, J Peng - arXiv preprint arXiv:2111.13485, 2021 - arxiv.org

Many practical applications of reinforcement learning require agents to learn from sparse
and delayed rewards. It challenges the ability of agents to attribute their actions to future …

被引用次数：37 相关文章所有 7 个版本

[PDF] arxiv.org

Asymptotic convergence and performance of multi-agent q-learning dynamics

AA Hussain, F Belardinelli, G Piliouras - arXiv preprint arXiv:2301.09619, 2023 - arxiv.org

Achieving convergence of multiple learning agents in general $ N $-player games is
imperative for the development of safe and reliable machine learning (ML) algorithms and …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

A unified stochastic approximation framework for learning in games

P Mertikopoulos, YP Hsieh, V Cevher - Mathematical Programming, 2024 - Springer

We develop a flexible stochastic approximation framework for analyzing the long-run
behavior of learning in games (both continuous and finite). The proposed analysis template …

被引用次数：16 相关文章所有 27 个版本

[PDF] jmlr.org

Multi-agent online optimization with delays: Asynchronicity, adaptivity, and optimism

YG Hsieh, F Iutzeler, J Malick… - Journal of Machine …, 2022 - jmlr.org

In this paper, we provide a general framework for studying multi-agent online learning
problems in the presence of delays and asynchronicities. Specifically, we propose and …

被引用次数：37 相关文章所有 15 个版本

[PDF] neurips.cc

Payoff-based learning with matrix multiplicative weights in quantum games

K Lotidis, P Mertikopoulos… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this paper, we study the problem of learning in quantum games-and other classes of
semidefinite games-with scalar, payoff-based feedback. For concreteness, we focus on the …

被引用次数：1 相关文章所有 15 个版本

[PDF] arxiv.org

Asymptotically unbiased estimation for delayed feedback modeling via label correction

Y Chen, J Jin, H Zhao, P Wang, G Liu, J Xu… - Proceedings of the ACM …, 2022 - dl.acm.org

Alleviating the delayed feedback problem is of crucial importance for the conversion rate
(CVR) prediction in online advertising. Previous delayed feedback modeling methods using …

被引用次数：19 相关文章所有 3 个版本