Response surface bandits

WB Powell - European Journal of Operational Research, 2019 - Elsevier

Stochastic optimization is an umbrella term that includes over a dozen fragmented
communities, using a patchwork of sometimes overlapping notational systems with …

被引用次数：290 相关文章所有 4 个版本

[PDF] acm.org

Google vizier: A service for black-box optimization

D Golovin, B Solnik, S Moitra, G Kochanski… - Proceedings of the 23rd …, 2017 - dl.acm.org

Any sufficiently complex system acts as a black box when it becomes easier to experiment
with than to understand. Hence, black-box optimization has become increasingly important …

被引用次数：848 相关文章所有 15 个版本

[PDF] arxiv.org

Linearly parameterized bandits

P Rusmevichientong… - Mathematics of Operations …, 2010 - pubsonline.informs.org

We consider bandit problems involving a large (possibly infinite) collection of arms, in which
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …

被引用次数：617 相关文章所有 20 个版本

[PDF] mit.edu

Dynamic assortment with demand learning for seasonal consumer goods

F Caro, J Gallien - Management science, 2007 - pubsonline.informs.org

Companies such as Zara and World Co. have recently implemented novel product
development processes and supply chain architectures enabling them to make more …

被引用次数：393 相关文章所有 18 个版本

[PDF] cornell.edu

The knowledge gradient algorithm for a general class of online learning problems

IO Ryzhov, WB Powell, PI Frazier - Operations Research, 2012 - pubsonline.informs.org

We derive a one-period look-ahead policy for finite-and infinite-horizon online optimal
learning problems with Gaussian rewards. Our approach is able to handle the case where …

被引用次数：241 相关文章所有 18 个版本

[PDF] informs.org

A linear response bandit problem

A Goldenshluger, A Zeevi - Stochastic Systems, 2013 - pubsonline.informs.org

We consider a two–armed bandit problem which involves sequential sampling from two non-
homogeneous populations. The response in each is determined by a random covariate …

被引用次数：162 相关文章所有 8 个版本

[PDF] springer.com

Bayesian policy reuse

B Rosman, M Hawasly, S Ramamoorthy - Machine Learning, 2016 - Springer

A long-lived autonomous agent should be able to respond online to novel instances of tasks
from a familiar domain. Acting online requires 'fast'responses, in terms of rapid convergence …

被引用次数：96 相关文章所有 12 个版本

[PDF] mit.edu

A structured multiarmed bandit problem and the greedy policy

AJ Mersereau, P Rusmevichientong… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org

We consider a multiarmed bandit problem where the expected reward of each arm is a
linear function of an unknown scalar with a prior distribution. The objective is to choose a …

被引用次数：122 相关文章所有 19 个版本

[PDF] arxiv.org

Apollo: Transferable architecture exploration

A Yazdanbakhsh, C Angermueller, B Akin… - arXiv preprint arXiv …, 2021 - arxiv.org

The looming end of Moore's Law and ascending use of deep learning drives the design of
custom accelerators that are optimized for specific neural architectures. Architecture …

被引用次数：24 相关文章所有 3 个版本

A unified framework for optimization under uncertainty

WB Powell - … challenges in complex, networked and risky …, 2016 - pubsonline.informs.org

Stochastic optimization, also known as optimization under uncertainty, is studied by over a
dozen communities, often (but not always) with different notational systems and styles …

被引用次数：54 相关文章

A unified framework for stochastic optimization