Deep bandits show-off: Simple and efficient exploration with deep networks

PE Iturria-Rivera, M Chenier… - … Machine Learning in …, 2024 - ieeexplore.ieee.org

The exponential increase in the demand for high-performance services such as streaming
video and gaming by wireless devices has posed several challenges for Wireless Local …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Two-Stage Neural Contextual Bandits for Personalised News Recommendation

M Zhang, T Nguyen-Tang, F Wu, Z He, X Xie… - arXiv preprint arXiv …, 2022 - arxiv.org

We consider the problem of personalised news recommendation where each user
consumes news in a sequential fashion. Existing personalised news recommendation …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions

K Xu, F Tajaddodianfar, B Allison - arXiv preprint arXiv:2406.10795, 2024 - arxiv.org

Recently proposed reward-conditioned policies (RCPs) offer an appealing alternative in
reinforcement learning. Compared with policy gradient methods, policy learning in RCPs is …

[PDF] openreview.net

MC Layer Normalization for calibrated uncertainty in Deep Learning

T Frick, D Antognini, I Giurgiu, BF Grewe… - … on Machine Learning …, 2024 - openreview.net

Efficiently estimating the uncertainty of neural network predictions has become an
increasingly important challenge as machine learning models are adopted for high-stakes …

UCB Exploration for Fixed-Budget Bayesian Best Arm Identification

RJB Zhu, Y Qiu - arXiv preprint arXiv:2408.04869, 2024 - arxiv.org

We study best-arm identification (BAI) in the fixed-budget setting. Adaptive allocations based
on upper confidence bounds (UCBs), such as UCBE, are known to work well in BAI …

Meta-Bandit: Spatial Reuse Adaptation via Meta-Learning in Distributed Wi-Fi 802.11 ax

PE Iturria-Rivera, M Chenier, B Herscovici… - IEEE Networking …, 2023 - ieeexplore.ieee.org

IEEE 802.11 ax introduces several amendments to previous standards with a special
interest in spatial reuse (SR) to respond to dense user scenarios with high demanding …

[PDF] ox.ac.uk

Reinforcement learning for bandits with continuous actions and large context spaces

P Duckworth, KA Vallis, B Lacerda, N Hawes - 2023 - ora.ox.ac.uk

We consider the challenging scenario of contextual bandits with continuous actions and
large context spaces. This is an increasingly important application area in personalised …

被引用次数：1 相关文章所有 2 个版本

[PDF] openreview.net

$\sbf {\delta^ 2} $-exploration for Reinforcement Learning

R Zhu, M Rigotti - openreview.net

Effectively tackling the\emph {exploration-exploitation dilemma} is still a major challenge in
reinforcement learning. Uncertainty-based exploration strategies developed in the bandit …