Uncertainty-aware instance reweighting for off-policy learning

文章

学术资源搜索

获得 3 条结果（用时0.04秒）

我的图书馆

Uncertainty-aware instance reweighting for off-policy learning

在引用文章中搜索

[PDF] arxiv.org

Large Language Models as Agents in Two-Player Games

Y Liu, P Sun, H Li - arXiv preprint arXiv:2402.08078, 2024 - arxiv.org

By formally defining the training processes of large language models (LLMs), which usually
encompasses pre-training, supervised fine-tuning, and reinforcement learning with human …

Semi-supervised batch learning from logged data

G Aminian, A Behnamnia, R Vega, L Toni, C Shi… - openreview.net

Offline policy learning methods are intended to learn a policy from logged data, which
includes context, action, and reward for each sample point. In this work we build on the …

[PDF] openreview.net

Batch Learning via Log-Sum-Exponential Estimator from Logged Bandit Feedback

A Behnamnia, G Aminian, A Aghaei, C Shi… - ICML 2024 Workshop … - openreview.net

Offline policy learning methods in batch learning aim to derive a policy from a logged bandit
feedback dataset, encompassing context, action, propensity score and feedback for each …