Nearly minimax algorithms for linear bandits with shared representation

J Yang, Q Lei, JD Lee, SS Du - arXiv preprint arXiv:2203.15664, 2022 - arxiv.org
We give novel algorithms for multi-task and lifelong linear bandits with shared
representation. Specifically, we consider the setting where we play $ M $ linear bandits with …

Optimal gradient-based algorithms for non-concave bandit optimization

B Huang, K Huang, S Kakade, JD Lee… - Advances in …, 2021 - proceedings.neurips.cc
Bandit problems with linear or concave reward have been extensively studied, but relatively
few works have studied bandits with non-concave reward. This work considers a large family …

Context-lumpable stochastic bandits

CW Lee, Q Liu, Y Abbasi Yadkori… - Advances in …, 2024 - proceedings.neurips.cc
We consider a contextual bandit problem with $ S $ contexts and $ K $ actions. In each
round $ t= 1, 2,\dots $ the learnerobserves a random context and chooses an action based …

Multi-task representation learning for pure exploration in linear bandits

Y Du, L Huang, W Sun - International Conference on …, 2023 - proceedings.mlr.press
Despite the recent success of representation learning in sequential decision making, the
study of the pure exploration scenario (ie, identify the best option and minimize the sample …

Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states

J Lumbreras, E Haapasalo, M Tomamichel - Quantum, 2022 - quantum-journal.org
We initiate the study of tradeoffs between exploration and exploitation in online learning of
properties of quantum states. Given sequential oracle access to an unknown quantum state …

Gaussian imagination in bandit learning

Y Liu, AM Devraj, B Van Roy, K Xu - arXiv preprint arXiv:2201.01902, 2022 - arxiv.org
Assuming distributions are Gaussian often facilitates computations that are otherwise
intractable. We study the performance of an agent that attains a bounded information ratio …

Sample complexity for quadratic bandits: hessian dependent bounds and optimal algorithms

Q Yu, Y Wang, B Huang, Q Lei… - Advances in Neural …, 2023 - proceedings.neurips.cc
In stochastic zeroth-order optimization, a problem of practical relevance is understanding
how to fully exploit the local geometry of the underlying objective function. We consider a …

Statistical complexity and optimal algorithms for nonlinear ridge bandits

N Rajaraman, Y Han, J Jiao… - The Annals of …, 2024 - projecteuclid.org
Statistical complexity and optimal algorithms for nonlinear ridge bandits Page 1 The Annals of
Statistics 2024, Vol. 52, No. 6, 2557–2582 https://doi.org/10.1214/24-AOS2395 © Institute of …

Optimal Sample Complexity Bounds for Non-convex Optimization under Kurdyka-Lojasiewicz Condition

Q Yu, Y Wang, B Huang, Q Lei… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Optimization of smooth reward functions under bandit feedback is a long-standing problem
in online learning. This paper approaches this problem by studying the convergence under …

Robust Gradient Descent for Phase Retrieval

A Buna, P Rebeschini - arXiv preprint arXiv:2410.10623, 2024 - arxiv.org
Recent progress in robust statistical learning has mainly tackled convex problems, like mean
estimation or linear regression, with non-convex challenges receiving less attention. Phase …