Contextual bandit with adaptive feature extraction

B Lin, D Bouneffouf, GA Cecchi… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
We consider an online decision making setting known as contextual bandit problem, and
propose an approach for improving contextual bandit performance by using an adaptive …

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press
While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

Online learning with corrupted context: Corrupted contextual bandits

D Bouneffouf - arXiv preprint arXiv:2006.15194, 2020 - arxiv.org
We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …

Online semi-supervised learning in contextual bandits with episodic reward

B Lin - AI 2020: Advances in Artificial Intelligence: 33rd …, 2020 - Springer
We considered a novel practical problem of online learning with episodically revealed
rewards, motivated by several real-world applications, where the contexts are nonstationary …

Stochastic dueling bandits with adversarial corruption

A Agarwal, S Agarwal, P Patil - Algorithmic Learning Theory, 2021 - proceedings.mlr.press
The dueling bandits problem has received a lot of attention in recent years due to its
applications in recommendation systems and information retrieval. However, due to the …

Top k ranking for multi-armed bandit with noisy evaluations

E Garcelon, V Avadhanula… - International …, 2022 - proceedings.mlr.press
We consider a multi-armed bandit setting where, at the beginning of each round, the learner
receives noisy independent, and possibly biased, evaluations of the true reward of each arm …

Online semi-supervised learning with bandit feedback

M Yurochkin, S Upadhyay, D Bouneffouf, M Agarwal… - 2019 - openreview.net
We formulate a new problem at the intersection of semi-supervised learning and contextual
bandits, motivated by several applications including clinical trials and dialog systems. We …

Corrupted contextual bandits: Online learning with corrupted context

D Bouneffouf - … 2021-2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org
We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …

Question Answering System with Sparse and Noisy Feedback

D Bouneffouf, O Alkan, R Feraud… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
The rise of personal assistants has made question answering a very popular mechanism for
user-system interaction. In Question Answering System, implicit feedbacks can be easily …

Towards Scalability and Robustness for Ranking, Clustering, and Multi-Armed Bandits

P Patil - 2024 - repository.upenn.edu
In recent years, machine learning has become an indispensable tool across various industry
domains, revolutionizing the way businesses leverage data to make decisions. One of the …