Balancing learning speed and stability in policy gradient via adaptive exploration

B Demirel, OB Baran, RG Cinbis - proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Few-shot object detection, the problem of modelling novel object detection categories with
few training instances, is an emerging topic in the area of few-shot learning and object …

被引用次数：15 相关文章所有 8 个版本

[PDF] jmlr.org

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org

How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

被引用次数：57 相关文章所有 6 个版本

[PDF] mlr.press

On the hidden biases of policy mirror ascent in continuous action spaces

AS Bedi, S Chakraborty, A Parayil… - International …, 2022 - proceedings.mlr.press

We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …

被引用次数：19 相关文章所有 6 个版本

[PDF] sciencedirect.com

Alleviating parameter-tuning burden in reinforcement learning for large-scale process control

L Zhu, G Takami, M Kawahara, H Kanokogi… - Computers & Chemical …, 2022 - Elsevier

Modern process controllers necessitate high quality models and remedial system re-
identification upon performance degradation. Reinforcement Learning (RL) can be a …

被引用次数：13 相关文章所有 2 个版本

[PDF] mlr.press

Truncating trajectories in Monte Carlo reinforcement learning

R Poiani, AM Metelli, M Restelli - … Conference on Machine …, 2023 - proceedings.mlr.press

Abstract In Reinforcement Learning (RL), an agent acts in an unknown environment to
maximize the expected cumulative discounted sum of an external reward signal, ie, the …

被引用次数：4 相关文章所有 8 个版本

[PDF] springer.com

Smoothing policies and safe policy gradients

M Papini, M Pirotta, M Restelli - Machine Learning, 2022 - Springer

Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …

被引用次数：40 相关文章所有 11 个版本

[PDF] jmlr.org

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org

Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

MAD for robust reinforcement learning in machine translation

D Donato, L Yu, W Ling, C Dyer - arXiv preprint arXiv:2207.08583, 2022 - arxiv.org

We introduce a new distributed policy gradient algorithm and show that it outperforms
existing reward-aware training procedures such as REINFORCE, minimum risk training …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arXiv preprint arXiv:2305.06851, 2023 - arxiv.org

Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

被引用次数：3 相关文章所有 6 个版本

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization

M Li, T Huang, W Zhu - International Journal of Machine Learning and …, 2021 - Springer

The optimization of continuous action control is an important research field. It aims to find
optimal decisions by the experience of making decisions in a continuous action control task …

被引用次数：9 相关文章