Contextual bandit with adaptive feature extraction
We consider an online decision making setting known as contextual bandit problem, and
propose an approach for improving contextual bandit performance by using an adaptive …
propose an approach for improving contextual bandit performance by using an adaptive …
Conservative exploration in reinforcement learning
E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …
exploration to discover new information about the MDP, and exploitation of the current …
Online learning with corrupted context: Corrupted contextual bandits
D Bouneffouf - arXiv preprint arXiv:2006.15194, 2020 - arxiv.org
We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …
side-information, or context, available to a decision-maker) where the context used at each …
Online semi-supervised learning in contextual bandits with episodic reward
B Lin - AI 2020: Advances in Artificial Intelligence: 33rd …, 2020 - Springer
We considered a novel practical problem of online learning with episodically revealed
rewards, motivated by several real-world applications, where the contexts are nonstationary …
rewards, motivated by several real-world applications, where the contexts are nonstationary …
Stochastic dueling bandits with adversarial corruption
The dueling bandits problem has received a lot of attention in recent years due to its
applications in recommendation systems and information retrieval. However, due to the …
applications in recommendation systems and information retrieval. However, due to the …
Top k ranking for multi-armed bandit with noisy evaluations
E Garcelon, V Avadhanula… - International …, 2022 - proceedings.mlr.press
We consider a multi-armed bandit setting where, at the beginning of each round, the learner
receives noisy independent, and possibly biased, evaluations of the true reward of each arm …
receives noisy independent, and possibly biased, evaluations of the true reward of each arm …
Online semi-supervised learning with bandit feedback
We formulate a new problem at the intersection of semi-supervised learning and contextual
bandits, motivated by several applications including clinical trials and dialog systems. We …
bandits, motivated by several applications including clinical trials and dialog systems. We …
Corrupted contextual bandits: Online learning with corrupted context
D Bouneffouf - … 2021-2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org
We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …
side-information, or context, available to a decision-maker) where the context used at each …
Question Answering System with Sparse and Noisy Feedback
The rise of personal assistants has made question answering a very popular mechanism for
user-system interaction. In Question Answering System, implicit feedbacks can be easily …
user-system interaction. In Question Answering System, implicit feedbacks can be easily …
Towards Scalability and Robustness for Ranking, Clustering, and Multi-Armed Bandits
P Patil - 2024 - repository.upenn.edu
In recent years, machine learning has become an indispensable tool across various industry
domains, revolutionizing the way businesses leverage data to make decisions. One of the …
domains, revolutionizing the way businesses leverage data to make decisions. One of the …