Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Causal reinforcement learning: A survey
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …
under uncertainty. Despite many remarkable achievements in recent decades, applying …
Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity
With the recent proliferation of mobile health technologies, health scientists are increasingly
interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via …
interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via …
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
Customer acquisition via display advertising using multi-armed bandit experiments
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …
since they are uncertain about which ones are most effective. During a campaign, firms try to …
Reinforcement learning for efficient network penetration testing
Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …
assessing the defenses of a computer network by planning and executing all possible …
Self-exploring language models: Active preference elicitation for online alignment
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Efficient model-based reinforcement learning through optimistic policy search and planning
Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …
models are amongst the most data-efficient learning methods. This is often attributed to their …
Model-based reinforcement learning and the eluder dimension
We consider the problem of learning to optimize an unknown Markov decision process
(MDP). We show that, if the MDP can be parameterized within some known function class …
(MDP). We show that, if the MDP can be parameterized within some known function class …
Learning to optimize via information-directed sampling
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …
problems in which a decision maker must balance between exploration and exploitation …