Online (multinomial) logistic bandit: Improved regret and constant computation cost
YJ Zhang, M Sugiyama - Advances in Neural Information …, 2024 - proceedings.neurips.cc
This paper investigates the logistic bandit problem, a variant of the generalized linear bandit
model that utilizes a logistic model to depict the feedback from an action. While most existing …
model that utilizes a logistic model to depict the feedback from an action. While most existing …
Adapting to online label shift with provable guarantees
The standard supervised learning paradigm works effectively when training data shares the
same distribution as the upcoming testing samples. However, this stationary assumption is …
same distribution as the upcoming testing samples. However, this stationary assumption is …
Universal online learning with gradient variations: A multi-layer online ensemble approach
In this paper, we propose an online convex optimization approach with two different levels of
adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures …
adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures …
A survey of decision making in adversarial games
In many practical applications, such as poker, chess, drug interdiction, cybersecurity, and
national defense, players often have adversarial stances, ie, the selfish actions of each …
national defense, players often have adversarial stances, ie, the selfish actions of each …
On the convergence of no-regret learning dynamics in time-varying games
I Anagnostides, I Panageas… - Advances in Neural …, 2024 - proceedings.neurips.cc
Most of the literature on learning in games has focused on the restrictive setting where the
underlying repeated game does not change over time. Much less is known about the …
underlying repeated game does not change over time. Much less is known about the …
Rethinking data-free quantization as a zero-sum game
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without
accessing the real data, but generates the fake sample via a generator (G) by learning from …
accessing the real data, but generates the fake sample via a generator (G) by learning from …
Efficient methods for non-stationary online learning
Non-stationary online learning has drawn much attention in recent years. In particular,\emph
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …
Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …
dynamic regret as the performance measure, defined as the difference between cumulative …
On the last-iterate convergence in time-varying zero-sum games: Extra gradient succeeds where optimism fails
Last-iterate convergence has received extensive study in two player zero-sum games
starting from bilinear, convex-concave up to settings that satisfy the MVI condition. Typical …
starting from bilinear, convex-concave up to settings that satisfy the MVI condition. Typical …
Non-stationary online learning with memory and non-stochastic control
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …
functions to depend on past decisions and thus captures temporal effects of learning …