Corrupt bandits

B Lin, D Bouneffouf, GA Cecchi… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org

We consider an online decision making setting known as contextual bandit problem, and
propose an approach for improving contextual bandit performance by using an adaptive …

被引用次数：45 相关文章所有 6 个版本

[PDF] mlr.press

Conservative exploration in reinforcement learning

E Garcelon, M Ghavamzadeh… - International …, 2020 - proceedings.mlr.press

While learning in an unknown Markov Decision Process (MDP), an agent should trade off
exploration to discover new information about the MDP, and exploitation of the current …

被引用次数：31 相关文章所有 11 个版本

[PDF] arxiv.org

Online learning with corrupted context: Corrupted contextual bandits

D Bouneffouf - arXiv preprint arXiv:2006.15194, 2020 - arxiv.org

We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Online semi-supervised learning in contextual bandits with episodic reward

B Lin - AI 2020: Advances in Artificial Intelligence: 33rd …, 2020 - Springer

We considered a novel practical problem of online learning with episodically revealed
rewards, motivated by several real-world applications, where the contexts are nonstationary …

被引用次数：18 相关文章所有 10 个版本

[PDF] mlr.press

Stochastic dueling bandits with adversarial corruption

A Agarwal, S Agarwal, P Patil - Algorithmic Learning Theory, 2021 - proceedings.mlr.press

The dueling bandits problem has received a lot of attention in recent years due to its
applications in recommendation systems and information retrieval. However, due to the …

被引用次数：13 相关文章所有 5 个版本

[PDF] mlr.press

Top k ranking for multi-armed bandit with noisy evaluations

E Garcelon, V Avadhanula… - International …, 2022 - proceedings.mlr.press

We consider a multi-armed bandit setting where, at the beginning of each round, the learner
receives noisy independent, and possibly biased, evaluations of the true reward of each arm …

被引用次数：6 相关文章所有 3 个版本

[PDF] openreview.net

Online semi-supervised learning with bandit feedback

M Yurochkin, S Upadhyay, D Bouneffouf, M Agarwal… - 2019 - openreview.net

We formulate a new problem at the intersection of semi-supervised learning and contextual
bandits, motivated by several applications including clinical trials and dialog systems. We …

被引用次数：6 相关文章

Corrupted contextual bandits: Online learning with corrupted context

D Bouneffouf - … 2021-2021 IEEE International Conference on …, 2021 - ieeexplore.ieee.org

We consider a novel variant of the contextual bandit problem (ie, the multi-armed bandit with
side-information, or context, available to a decision-maker) where the context used at each …

被引用次数：5 相关文章所有 2 个版本

[PDF] bytesofminds.com

Question Answering System with Sparse and Noisy Feedback

D Bouneffouf, O Alkan, R Feraud… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

The rise of personal assistants has made question answering a very popular mechanism for
user-system interaction. In Question Answering System, implicit feedbacks can be easily …

被引用次数：1 相关文章所有 3 个版本

[PDF] upenn.edu

Towards Scalability and Robustness for Ranking, Clustering, and Multi-Armed Bandits

P Patil - 2024 - repository.upenn.edu

In recent years, machine learning has become an indispensable tool across various industry
domains, revolutionizing the way businesses leverage data to make decisions. One of the …

Contextual bandit with adaptive feature extraction

Conservative exploration in reinforcement learning

Online learning with corrupted context: Corrupted contextual bandits

Online semi-supervised learning in contextual bandits with episodic reward

Stochastic dueling bandits with adversarial corruption

Top k ranking for multi-armed bandit with noisy evaluations

Online semi-supervised learning with bandit feedback

Corrupted contextual bandits: Online learning with corrupted context

Question Answering System with Sparse and Noisy Feedback

Towards Scalability and Robustness for Ranking, Clustering, and Multi-Armed Bandits

高级搜索

引用