Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - arXiv preprint arXiv:2307.01452, 2023 - arxiv.org
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity

P Liao, K Greenewald, P Klasnja… - Proceedings of the ACM on …, 2020 - dl.acm.org
With the recent proliferation of mobile health technologies, health scientists are increasingly
interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via …

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

Customer acquisition via display advertising using multi-armed bandit experiments

EM Schwartz, ET Bradlow, PS Fader - Marketing Science, 2017 - pubsonline.informs.org
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …

Reinforcement learning for efficient network penetration testing

MC Ghanem, TM Chen - Information, 2019 - mdpi.com
Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

Model-based reinforcement learning and the eluder dimension

I Osband, B Van Roy - Advances in Neural Information …, 2014 - proceedings.neurips.cc
We consider the problem of learning to optimize an unknown Markov decision process
(MDP). We show that, if the MDP can be parameterized within some known function class …

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …