Cooperate or not Cooperate: Transfer Learning with Multi-Armed Bandit for Spatial Reuse in Wi-Fi

PE Iturria-Rivera, M Chenier… - … Machine Learning in …, 2024 - ieeexplore.ieee.org
The exponential increase in the demand for high-performance services such as streaming
video and gaming by wireless devices has posed several challenges for Wireless Local …

Two-Stage Neural Contextual Bandits for Personalised News Recommendation

M Zhang, T Nguyen-Tang, F Wu, Z He, X Xie… - arXiv preprint arXiv …, 2022 - arxiv.org
We consider the problem of personalised news recommendation where each user
consumes news in a sequential fashion. Existing personalised news recommendation …

Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions

K Xu, F Tajaddodianfar, B Allison - arXiv preprint arXiv:2406.10795, 2024 - arxiv.org
Recently proposed reward-conditioned policies (RCPs) offer an appealing alternative in
reinforcement learning. Compared with policy gradient methods, policy learning in RCPs is …

MC Layer Normalization for calibrated uncertainty in Deep Learning

T Frick, D Antognini, I Giurgiu, BF Grewe… - … on Machine Learning …, 2024 - openreview.net
Efficiently estimating the uncertainty of neural network predictions has become an
increasingly important challenge as machine learning models are adopted for high-stakes …

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

RJB Zhu, Y Qiu - arXiv preprint arXiv:2408.04869, 2024 - arxiv.org
We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based
on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI …

Meta-Bandit: Spatial Reuse Adaptation via Meta-Learning in Distributed Wi-Fi 802.11 ax

PE Iturria-Rivera, M Chenier, B Herscovici… - IEEE Networking …, 2023 - ieeexplore.ieee.org
IEEE 802.11 ax introduces several amendments to previous standards with a special
interest in spatial reuse (SR) to respond to dense user scenarios with high demanding …

Reinforcement learning for bandits with continuous actions and large context spaces

P Duckworth, KA Vallis, B Lacerda, N Hawes - 2023 - ora.ox.ac.uk
We consider the challenging scenario of contextual bandits with continuous actions and
large context spaces. This is an increasingly important application area in personalised …

$\sbf {\delta^ 2} $-exploration for Reinforcement Learning

R Zhu, M Rigotti - openreview.net
Effectively tackling the\emph {exploration-exploitation dilemma} is still a major challenge in
reinforcement learning. Uncertainty-based exploration strategies developed in the bandit …