A Tutorial on Multi-Armed Bandit Applications for Large Language Models

D Bouneffouf, R Féraud - Proceedings of the 30th ACM SIGKDD …, 2024 - dl.acm.org
This tutorial offers a comprehensive guide on using multi-armed bandit (MAB) algorithms to
improve Large Language Models (LLMs). As Natural Language Processing (NLP) tasks …

Efficient methods in counterfactual policy learning and sequential decision making

H Zenati - 2023 - theses.hal.science
Because logged data has become ubiquitous in wide-range applications and since
onlineexploration may be sensitive, counterfactual methods have gained significant …

Semi-supervised batch learning from logged data

G Aminian, A Behnamnia, R Vega, L Toni, C Shi… - openreview.net
Offline policy learning methods are intended to learn a policy from logged data, which
includes context, action, and reward for each sample point. In this work we build on the …

[PDF][PDF] Counterfactual Estimation from Logged Data

R Féraud - 2023 - researchgate.net
Counterfactual Estimation from Logged Data Page 1 Counterfactual Estimation from Logged
Data Raphaël Féraud ORANGE Innovation March 2023 Raphaël Féraud (Orange Innovation) …