Learning on the edge: Online learning with stochastic feedback graphs
E Esposito, F Fusco… - Advances in …, 2022 - proceedings.neurips.cc
The framework of feedback graphs is a generalization of sequential decision-making with
bandit or full information feedback. In this work, we study an extension where the directed …
bandit or full information feedback. In this work, we study an extension where the directed …
A near-optimal best-of-both-worlds algorithm for online learning with feedback graphs
C Rouyer, D van der Hoeven… - Advances in …, 2022 - proceedings.neurips.cc
We consider online learning with feedback graphs, a sequential decision-making framework
where the learner's feedback is determined by a directed graph over the action set. We …
where the learner's feedback is determined by a directed graph over the action set. We …
An -regret analysis of Adversarial Bilateral Trade
We study sequential bilateral trade where sellers and buyers valuations are completely
arbitrary ({\sl ie}, determined by an adversary). Sellers and buyers are strategic agents with …
arbitrary ({\sl ie}, determined by an adversary). Sellers and buyers are strategic agents with …
Online learning with set-valued feedback
We study a variant of online multiclass classification where the learner predicts a single
label but receives a\textit {set of labels} as feedback. In this model, the learner is penalized …
label but receives a\textit {set of labels} as feedback. In this model, the learner is penalized …
Practical contextual bandits with feedback graphs
While contextual bandit has a mature theory, effectively leveraging different feedback
patterns to enhance the pace of learning remains unclear. Bandits with feedback graphs …
patterns to enhance the pace of learning remains unclear. Bandits with feedback graphs …
A regret-variance trade-off in online learning
D Van der Hoeven, N Zhivotovskiy… - Advances in Neural …, 2022 - proceedings.neurips.cc
We consider prediction with expert advice for strongly convex and bounded losses, and
investigate trade-offs between regret and``variance''(ie, squared difference of learner's …
investigate trade-offs between regret and``variance''(ie, squared difference of learner's …
Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss
This paper studies online structured prediction with full-information feedback. For online
multiclass classification, van der Hoeven (2020) has obtained surrogate regret bounds …
multiclass classification, van der Hoeven (2020) has obtained surrogate regret bounds …
Neural Active Learning Meets the Partial Monitoring Framework
M Heuillet, O Ahmad, A Durand - arXiv preprint arXiv:2405.08921, 2024 - arxiv.org
We focus on the online-based active learning (OAL) setting where an agent operates over a
stream of observations and trades-off between the costly acquisition of information (labelled …
stream of observations and trades-off between the costly acquisition of information (labelled …
Trading-off payments and accuracy in online classification with paid stochastic experts
D Van Der Hoeven, C Pike-Burke… - International …, 2023 - proceedings.mlr.press
We investigate online classification with paid stochastic experts. Here, before making their
prediction, each expert must be paid. The amount that we pay each expert directly …
prediction, each expert must be paid. The amount that we pay each expert directly …
[HTML][HTML] An α-regret analysis of adversarial bilateral trade
We study sequential bilateral trade where sellers and buyers valuations are completely
arbitrary (ie, determined by an adversary). Sellers and buyers are strategic agents with …
arbitrary (ie, determined by an adversary). Sellers and buyers are strategic agents with …