No-regret learning in time-varying zero-sum games

YJ Zhang, M Sugiyama - Advances in Neural Information …, 2024 - proceedings.neurips.cc

This paper investigates the logistic bandit problem, a variant of the generalized linear bandit
model that utilizes a logistic model to depict the feedback from an action. While most existing …

被引用次数：10 相关文章所有 4 个版本

[PDF] neurips.cc

Adapting to online label shift with provable guarantees

Y Bai, YJ Zhang, P Zhao… - Advances in Neural …, 2022 - proceedings.neurips.cc

The standard supervised learning paradigm works effectively when training data shares the
same distribution as the upcoming testing samples. However, this stationary assumption is …

被引用次数：28 相关文章所有 9 个版本

[PDF] neurips.cc

Universal online learning with gradient variations: A multi-layer online ensemble approach

YH Yan, P Zhao, ZH Zhou - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In this paper, we propose an online convex optimization approach with two different levels of
adaptivity. On a higher level, our approach is agnostic to the unknown types and curvatures …

被引用次数：13 相关文章所有 7 个版本

[PDF] arxiv.org

A survey of decision making in adversarial games

X Li, M Meng, Y Hong, J Chen - Science China Information Sciences, 2024 - Springer

In many practical applications, such as poker, chess, drug interdiction, cybersecurity, and
national defense, players often have adversarial stances, ie, the selfish actions of each …

被引用次数：16 相关文章所有 3 个版本

[PDF] neurips.cc

On the convergence of no-regret learning dynamics in time-varying games

I Anagnostides, I Panageas… - Advances in Neural …, 2024 - proceedings.neurips.cc

Most of the literature on learning in games has focused on the restrictive setting where the
underlying repeated game does not change over time. Much less is known about the …

被引用次数：18 相关文章所有 5 个版本

[PDF] aaai.org

Rethinking data-free quantization as a zero-sum game

B Qian, Y Wang, R Hong, M Wang - … of the AAAI conference on artificial …, 2023 - ojs.aaai.org

Data-free quantization (DFQ) recovers the performance of quantized network (Q) without
accessing the real data, but generates the fake sample via a generator (G) by learning from …

被引用次数：18 相关文章所有 4 个版本

[PDF] neurips.cc

Efficient methods for non-stationary online learning

P Zhao, YF Xie, L Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

Non-stationary online learning has drawn much attention in recent years. In particular,\emph
{dynamic regret} and\emph {adaptive regret} are proposed as two principled performance …

被引用次数：21 相关文章所有 12 个版本

[PDF] jmlr.org

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org

We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …

被引用次数：41 相关文章所有 5 个版本

[PDF] neurips.cc

On the last-iterate convergence in time-varying zero-sum games: Extra gradient succeeds where optimism fails

Y Feng, H Fu, Q Hu, P Li… - Advances in Neural …, 2024 - proceedings.neurips.cc

Last-iterate convergence has received extensive study in two player zero-sum games
starting from bilinear, convex-concave up to settings that satisfy the MVI condition. Typical …

被引用次数：6 相关文章所有 5 个版本

[PDF] jmlr.org

Non-stationary online learning with memory and non-stochastic control

P Zhao, YH Yan, YX Wang, ZH Zhou - The Journal of Machine Learning …, 2023 - dl.acm.org

We study the problem of Online Convex Optimization (OCO) with memory, which allows loss
functions to depend on past decisions and thus captures temporal effects of learning …

被引用次数：49 相关文章所有 8 个版本