Meta-tuning loss functions and data augmentation for few-shot object detection

B Demirel, OB Baran, RG Cinbis - proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Few-shot object detection, the problem of modelling novel object detection categories with
few training instances, is an emerging topic in the area of few-shot learning and object …

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org
How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

On the hidden biases of policy mirror ascent in continuous action spaces

AS Bedi, S Chakraborty, A Parayil… - International …, 2022 - proceedings.mlr.press
We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …

Alleviating parameter-tuning burden in reinforcement learning for large-scale process control

L Zhu, G Takami, M Kawahara, H Kanokogi… - Computers & Chemical …, 2022 - Elsevier
Modern process controllers necessitate high quality models and remedial system re-
identification upon performance degradation. Reinforcement Learning (RL) can be a …

Truncating trajectories in Monte Carlo reinforcement learning

R Poiani, AM Metelli, M Restelli - … Conference on Machine …, 2023 - proceedings.mlr.press
Abstract In Reinforcement Learning (RL), an agent acts in an unknown environment to
maximize the expected cumulative discounted sum of an external reward signal, ie, the …

Smoothing policies and safe policy gradients

M Papini, M Pirotta, M Restelli - Machine Learning, 2022 - Springer
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org
Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

MAD for robust reinforcement learning in machine translation

D Donato, L Yu, W Ling, C Dyer - arXiv preprint arXiv:2207.08583, 2022 - arxiv.org
We introduce a new distributed policy gradient algorithm and show that it outperforms
existing reward-aware training procedures such as REINFORCE, minimum risk training …

Policy gradient algorithms implicitly optimize by continuation

A Bolland, G Louppe, D Ernst - arXiv preprint arXiv:2305.06851, 2023 - arxiv.org
Direct policy optimization in reinforcement learning is usually solved with policy-gradient
algorithms, which optimize policy parameters via stochastic gradient ascent. This paper …

Adaptive exploration policy for exploration–exploitation tradeoff in continuous action control optimization

M Li, T Huang, W Zhu - International Journal of Machine Learning and …, 2021 - Springer
The optimization of continuous action control is an important research field. It aims to find
optimal decisions by the experience of making decisions in a continuous action control task …