(More) efficient reinforcement learning via posterior sampling

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：196 相关文章所有 13 个版本

[PDF] arxiv.org

Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - arXiv preprint arXiv:2307.01452, 2023 - arxiv.org

Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

被引用次数：15 相关文章所有 5 个版本

[HTML] nih.gov

Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity

P Liao, K Greenewald, P Klasnja… - Proceedings of the ACM on …, 2020 - dl.acm.org

With the recent proliferation of mobile health technologies, health scientists are increasingly
interested in developing just-in-time adaptive interventions (JITAIs), typically delivered via …

被引用次数：202 相关文章所有 10 个版本

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc

We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

被引用次数：253 相关文章所有 13 个版本

[PDF] ssrn.com

Customer acquisition via display advertising using multi-armed bandit experiments

EM Schwartz, ET Bradlow, PS Fader - Marketing Science, 2017 - pubsonline.informs.org

Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …

被引用次数：388 相关文章所有 13 个版本

[PDF] mdpi.com

Reinforcement learning for efficient network penetration testing

MC Ghanem, TM Chen - Information, 2019 - mdpi.com

Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …

被引用次数：156 相关文章所有 10 个版本

[PDF] arxiv.org

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

被引用次数：14 相关文章所有 3 个版本

[PDF] neurips.cc

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

被引用次数：106 相关文章所有 7 个版本

[PDF] neurips.cc

Model-based reinforcement learning and the eluder dimension

I Osband, B Van Roy - Advances in Neural Information …, 2014 - proceedings.neurips.cc

We consider the problem of learning to optimize an unknown Markov decision process
(MDP). We show that, if the MDP can be parameterized within some known function class …

被引用次数：213 相关文章所有 7 个版本

[PDF] informs.org

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org

We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …

被引用次数：146 相关文章所有 6 个版本