Hypermodels for exploration

I Osband, Z Wen, SM Asghari… - Advances in …, 2023 - proceedings.neurips.cc

Intelligence relies on an agent's knowledge of what it does not know. This capability can be
assessed based on the quality of joint predictions of labels across multiple inputs. In …

被引用次数：120 相关文章所有 6 个版本

[PDF] nowpublishers.com

Reinforcement learning, bit by bit

X Lu, B Van Roy, V Dwaracherla… - … and Trends® in …, 2023 - nowpublishers.com

Reinforcement learning agents have demonstrated remarkable achievements in simulated
environments. Data efficiency poses an impediment to carrying this success over to real …

被引用次数：79 相关文章所有 4 个版本

[PDF] neurips.cc

Posterior meta-replay for continual learning

C Henning, M Cervera, F D'Angelo… - Advances in neural …, 2021 - proceedings.neurips.cc

Learning a sequence of tasks without access to iid observations is a widely studied form of
continual learning (CL) that remains challenging. In principle, Bayesian learning directly …

被引用次数：64 相关文章所有 12 个版本

[PDF] arxiv.org

Scalable neural contextual bandit for recommender systems

Z Zhu, B Van Roy - Proceedings of the 32nd ACM International …, 2023 - dl.acm.org

High-quality recommender systems ought to deliver both innovative and relevant content
through effective and exploratory interactions with users. Yet, supervised learning-based …

被引用次数：15 相关文章所有 4 个版本

[PDF] openreview.net

Controllable pareto multi-task learning

X Lin, Z Yang, Q Zhang, S Kwong - 2020 - openreview.net

A multi-task learning (MTL) system aims at solving multiple related tasks at the same time.
With a fixed model capacity, the tasks would be conflicted with each other, and the system …

被引用次数：76 相关文章所有 3 个版本

[PDF] mlr.press

Approximate thompson sampling via epistemic neural networks

I Osband, Z Wen, SM Asghari… - Uncertainty in …, 2023 - proceedings.mlr.press

Thompson sampling (TS) is a popular heuristic for action selection, but it requires sampling
from a posterior distribution. Unfortunately, this can become computationally intractable in …

被引用次数：23 相关文章所有 7 个版本

[PDF] arxiv.org

Efficient exploration for llms

V Dwaracherla, SM Asghari, B Hao… - arXiv preprint arXiv …, 2024 - arxiv.org

We present evidence of substantial benefit from efficient exploration in gathering human
feedback to improve large language models. In our experiments, an agent sequentially …

被引用次数：15 相关文章所有 3 个版本

[PDF] arxiv.org

Visual affordance prediction for guiding robot exploration

H Bharadhwaj, A Gupta… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Motivated by the intuitive understanding humans have about the space of possible
interactions, and the ease with which they can generalize this understanding to previously …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Uncertainty estimation for language reward models

A Gleave, G Irving - arXiv preprint arXiv:2203.07472, 2022 - arxiv.org

Language models can learn a range of capabilities from unsupervised training on text
corpora. However, to solve a particular problem (such as text summarization) it is typically …

被引用次数：26 相关文章所有 2 个版本

[PDF] ethz.ch

Meta-learning via hypernetworks

D Zhao, S Kobayashi… - 4th Workshop on …, 2020 - research-collection.ethz.ch

Recent developments in few-shot learning have shown that during fast adaption, gradient-
based meta-learners mostly rely on embedding features of powerful pretrained networks …

被引用次数：56 相关文章所有 8 个版本

Epistemic neural networks

Reinforcement learning, bit by bit

Posterior meta-replay for continual learning

Scalable neural contextual bandit for recommender systems

Controllable pareto multi-task learning

Approximate thompson sampling via epistemic neural networks

Efficient exploration for llms

Visual affordance prediction for guiding robot exploration

Uncertainty estimation for language reward models

Meta-learning via hypernetworks

高级搜索

引用