Bayesian design principles for frequentist sequential learning

Y Xu, A Zeevi - International Conference on Machine …, 2023 - proceedings.mlr.press
We develop a general theory to optimize the frequentist regret for sequential learning
problems, where efficient bandit and reinforcement learning algorithms can be derived from …

Improved Bayesian regret bounds for Thompson sampling in reinforcement learning

A Moradipari, M Pedramfar… - Advances in …, 2023 - proceedings.neurips.cc
In this paper, we prove state-of-the-art Bayesian regret bounds for Thompson Sampling in
reinforcement learning in a multitude of settings. We present a refined analysis of the …

Linear partial monitoring for sequential decision making: Algorithms, regret bounds and applications

J Kirschner, T Lattimore, A Krause - The Journal of Machine Learning …, 2023 - dl.acm.org
Partial monitoring is an expressive framework for sequential decision-making with an
abundance of applications, including graph-structured and dueling bandits, dynamic pricing …

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc
The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

Leveraging demonstrations to improve online learning: Quality matters

B Hao, R Jain, T Lattimore… - … on Machine Learning, 2023 - proceedings.mlr.press
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …

Value of Information and Reward Specification in Active Inference and POMDPs

R Wei - arXiv preprint arXiv:2408.06542, 2024 - arxiv.org
Expected free energy (EFE) is a central quantity in active inference which has recently
gained popularity due to its intuitive decomposition of the expected value of control into a …

Bayesian reinforcement learning with limited cognitive load

D Arumugam, MK Ho, ND Goodman, B Van Roy - Open Mind, 2024 - direct.mit.edu
All biological and artificial agents must act given limits on their ability to acquire and process
information. As such, a general theory of adaptive behavior should be able to account for the …

Probabilistic inference in reinforcement learning done right

J Tarbouriech, T Lattimore… - Advances in Neural …, 2024 - proceedings.neurips.cc
A popular perspective in Reinforcement learning (RL) casts the problem as probabilistic
inference on a graphical model of the Markov decision process (MDP). The core object of …

Steering: Stein information directed exploration for model-based reinforcement learning

S Chakraborty, AS Bedi, A Koppel, M Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Directed Exploration is a crucial challenge in reinforcement learning (RL), especially when
rewards are sparse. Information-directed sampling (IDS), which optimizes the information …

Dynamic Online Recommendation for Two-Sided Market with Bayesian Incentive Compatibility

Y Li, G Cheng, X Dai - arXiv preprint arXiv:2406.04374, 2024 - arxiv.org
Recommender systems play a crucial role in internet economies by connecting users with
relevant products or services. However, designing effective recommender systems faces two …