A systematic study on reinforcement learning based applications

K Sivamayil, E Rajasekar, B Aljafari, S Nikolovski… - Energies, 2023 - mdpi.com
We have analyzed 127 publications for this review paper, which discuss applications of
Reinforcement Learning (RL) in marketing, robotics, gaming, automated cars, natural …

Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook

B Lin - Expert Systems with Applications, 2024 - Elsevier
In recent years, reinforcement learning and bandits have transformed a wide range of real-
world applications including healthcare, finance, recommendation systems, robotics, and …

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org
Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

Scalable neural contextual bandit for recommender systems

Z Zhu, B Van Roy - Proceedings of the 32nd ACM International …, 2023 - dl.acm.org
High-quality recommender systems ought to deliver both innovative and relevant content
through effective and exploratory interactions with users. Yet, supervised learning-based …

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org
Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

Flexible recommendation for optimizing the debt collection process based on customer risk using deep reinforcement learning

K Sivamayilvelan, E Rajasekar… - Expert Systems with …, 2024 - Elsevier
Finance sector loss can be minimized by reducing the number of defaulters who often miss
payments during debt collection. Most research focused on the credit risk analysis before …

Deep meta-learning in recommendation systems: A survey

C Wang, Y Zhu, H Liu, T Zang, J Yu, F Tang - arXiv preprint arXiv …, 2022 - arxiv.org
Deep neural network based recommendation systems have achieved great success as
information filtering techniques in recent years. However, since model training from scratch …

Efficient online bayesian inference for neural bandits

G Duran-Martin, A Kara… - … Conference on Artificial …, 2022 - proceedings.mlr.press
In this paper we present a new algorithm for online (sequential) inference in Bayesian
neural networks, and show its suitability for tackling contextual bandit problems. The key …

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

Adversarial gradient driven exploration for deep click-through rate prediction

K Wu, W Bian, Z Chan, L Ren, S Xiang… - Proceedings of the 28th …, 2022 - dl.acm.org
Exploration-Exploitation (E& E) algorithms are commonly adopted to deal with the feedback-
loop issue in large-scale online recommender systems. Most of existing studies believe that …