Online (multinomial) logistic bandit: Improved regret and constant computation cost

YJ Zhang, M Sugiyama - Advances in Neural Information …, 2024 - proceedings.neurips.cc
This paper investigates the logistic bandit problem, a variant of the generalized linear bandit
model that utilizes a logistic model to depict the feedback from an action. While most existing …

Adapting to online label shift with provable guarantees

Y Bai, YJ Zhang, P Zhao… - Advances in Neural …, 2022 - proceedings.neurips.cc
The standard supervised learning paradigm works effectively when training data shares the
same distribution as the upcoming testing samples. However, this stationary assumption is …

Universal online learning with gradient variations: A multi-layer online ensemble approach

YH Yan, P Zhao, ZH Zhou - Advances in Neural Information …, 2023 - proceedings.neurips.cc
In this paper, we propose an online convex optimization approach with two different levels of
adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures …

A survey of decision making in adversarial games

X Li, M Meng, Y Hong, J Chen - Science China Information Sciences, 2024 - Springer
In many practical applications, such as poker, chess, drug interdiction, cybersecurity, and
national defense, players often have adversarial stances, ie, the selfish actions of each …

On the convergence of no-regret learning dynamics in time-varying games

I Anagnostides, I Panageas… - Advances in Neural …, 2024 - proceedings.neurips.cc
Most of the literature on learning in games has focused on the restrictive setting where the
underlying repeated game does not change over time. Much less is known about the …

Rethinking data-free quantization as a zero-sum game

B Qian, Y Wang, R Hong, M Wang - … of the AAAI conference on artificial …, 2023 - ojs.aaai.org
Data-free quantization (DFQ) recovers the performance of quantized network (Q) without
accessing the real data, but generates the fake sample via a generator (G) by learning from …

Efficient methods for non-stationary online learning

P Zhao, YF Xie, L Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc
Non-stationary online learning has drawn much attention in recent years. In particular,\emph
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …

On the last-iterate convergence in time-varying zero-sum games: Extra gradient succeeds where optimism fails

Y Feng, H Fu, Q Hu, P Li… - Advances in Neural …, 2024 - proceedings.neurips.cc
Last-iterate convergence has received extensive study in two player zero-sum games
starting from bilinear, convex-concave up to settings that satisfy the MVI condition. Typical …

Non-stationary online learning with memory and non-stochastic control

P Zhao, YH Yan, YX Wang, ZH Zhou - The Journal of Machine Learning …, 2023 - dl.acm.org
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …