Deep bayesian bandits: Exploring in online personalized recommendations

K Sivamayil, E Rajasekar, B Aljafari, S Nikolovski… - Energies, 2023 - mdpi.com

We have analyzed 127 publications for this review paper, which discuss applications of
Reinforcement Learning (RL) in marketing, robotics, gaming, automated cars, natural …

被引用次数：73 相关文章所有 7 个版本

[PDF] arxiv.org

Reinforcement learning and bandits for speech and language processing: Tutorial, review and outlook

B Lin - Expert Systems with Applications, 2024 - Elsevier

In recent years, reinforcement learning and bandits have transformed a wide range of real-
world applications including healthcare, finance, recommendation systems, robotics, and …

被引用次数：33 相关文章所有 7 个版本

[PDF] researchgate.net

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org

Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

被引用次数：51 相关文章所有 4 个版本

[PDF] arxiv.org

Scalable neural contextual bandit for recommender systems

Z Zhu, B Van Roy - Proceedings of the 32nd ACM International …, 2023 - dl.acm.org

High-quality recommender systems ought to deliver both innovative and relevant content
through effective and exploratory interactions with users. Yet, supervised learning-based …

被引用次数：15 相关文章所有 4 个版本

[PDF] acm.org

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org

Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

被引用次数：16 相关文章所有 2 个版本

Flexible recommendation for optimizing the debt collection process based on customer risk using deep reinforcement learning

K Sivamayilvelan, E Rajasekar… - Expert Systems with …, 2024 - Elsevier

Finance sector loss can be minimized by reducing the number of defaulters who often miss
payments during debt collection. Most research focused on the credit risk analysis before …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Deep meta-learning in recommendation systems: A survey

C Wang, Y Zhu, H Liu, T Zang, J Yu, F Tang - arXiv preprint arXiv …, 2022 - arxiv.org

Deep neural network based recommendation systems have achieved great success as
information filtering techniques in recent years. However, since model training from scratch …

被引用次数：17 相关文章所有 2 个版本

[PDF] mlr.press

Efficient online bayesian inference for neural bandits

G Duran-Martin, A Kara… - … Conference on Artificial …, 2022 - proceedings.mlr.press

In this paper we present a new algorithm for online (sequential) inference in Bayesian
neural networks, and show its suitability for tackling contextual bandit problems. The key …

被引用次数：17 相关文章所有 3 个版本

[PDF] mlr.press

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press

While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

被引用次数：31 相关文章所有 11 个版本

[PDF] arxiv.org

Adversarial gradient driven exploration for deep click-through rate prediction

K Wu, W Bian, Z Chan, L Ren, S Xiang… - Proceedings of the 28th …, 2022 - dl.acm.org

Exploration-Exploitation (E& E) algorithms are commonly adopted to deal with the feedback-
loop issue in large-scale online recommender systems. Most of existing studies believe that …

被引用次数：14 相关文章所有 4 个版本