Regret bounds for information-directed reinforcement learning

Y Xu, A Zeevi - International Conference on Machine …, 2023 - proceedings.mlr.press

We develop a general theory to optimize the frequentist regret for sequential learning
problems, where efficient bandit and reinforcement learning algorithms can be derived from …

被引用次数：11 相关文章所有 8 个版本

[PDF] neurips.cc

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc

In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

被引用次数：4 相关文章所有 6 个版本

[PDF] jmlr.org

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org

Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

被引用次数：5 相关文章所有 4 个版本

[PDF] neurips.cc

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc

The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

被引用次数：15 相关文章所有 7 个版本

[PDF] mlr.press

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press

We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

被引用次数：6 相关文章所有 6 个版本

[PDF] arxiv.org

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arXiv preprint arXiv:2408.06542, 2024 - arxiv.org

Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

被引用次数：2 相关文章所有 3 个版本

[PDF] mit.edu

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu

All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

被引用次数：10 相关文章所有 7 个版本

[PDF] neurips.cc

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2024 - proceedings.neurips.cc

A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

被引用次数：1 相关文章所有 6 个版本

[PDF] arxiv.org

Steering: Stein information directed exploration for model-based reinforcement learning

S Chakraborty, AS Bedi, A Koppel, M Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when
rewards are sparse. Information-directed sampling (IDS), which optimizes the information …

被引用次数：6 相关文章所有 7 个版本

[PDF] arxiv.org

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Y Li, G Cheng, X Dai - arXiv preprint arXiv:2406.04374, 2024 - arxiv.org

Recommender systems play a crucial role in internet economies by connecting users with
relevant products or services. However, designing effective recommender systems faces two …

被引用次数：1 相关文章所有 2 个版本