A unified framework for stochastic optimization

WB Powell - European Journal of Operational Research, 2019 - Elsevier
Stochastic optimization is an umbrella term that includes over a dozen fragmented
communities, using a patchwork of sometimes overlapping notational systems with …

Google vizier: A service for black-box optimization

D Golovin, B Solnik, S Moitra, G Kochanski… - Proceedings of the 23rd …, 2017 - dl.acm.org
Any sufficiently complex system acts as a black box when it becomes easier to experiment
with than to understand. Hence, black-box optimization has become increasingly important …

Linearly parameterized bandits

P Rusmevichientong… - Mathematics of Operations …, 2010 - pubsonline.informs.org
We consider bandit problems involving a large (possibly infinite) collection of arms, in which
the expected reward of each arm is a linear function of an r-dimensional random vector Z∈ …

Dynamic assortment with demand learning for seasonal consumer goods

F Caro, J Gallien - Management science, 2007 - pubsonline.informs.org
Companies such as Zara and World Co. have recently implemented novel product
development processes and supply chain architectures enabling them to make more …

The knowledge gradient algorithm for a general class of online learning problems

IO Ryzhov, WB Powell, PI Frazier - Operations Research, 2012 - pubsonline.informs.org
We derive a one-period look-ahead policy for finite-and infinite-horizon online optimal
learning problems with Gaussian rewards. Our approach is able to handle the case where …

A linear response bandit problem

A Goldenshluger, A Zeevi - Stochastic Systems, 2013 - pubsonline.informs.org
We consider a two–armed bandit problem which involves sequential sampling from two non-
homogeneous populations. The response in each is determined by a random covariate …

Bayesian policy reuse

B Rosman, M Hawasly, S Ramamoorthy - Machine Learning, 2016 - Springer
A long-lived autonomous agent should be able to respond online to novel instances of tasks
from a familiar domain. Acting online requires 'fast'responses, in terms of rapid convergence …

A structured multiarmed bandit problem and the greedy policy

AJ Mersereau, P Rusmevichientong… - IEEE Transactions on …, 2009 - ieeexplore.ieee.org
We consider a multiarmed bandit problem where the expected reward of each arm is a
linear function of an unknown scalar with a prior distribution. The objective is to choose a …

Apollo: Transferable architecture exploration

A Yazdanbakhsh, C Angermueller, B Akin… - arXiv preprint arXiv …, 2021 - arxiv.org
The looming end of Moore's Law and ascending use of deep learning drives the design of
custom accelerators that are optimized for specific neural architectures. Architecture …

A unified framework for optimization under uncertainty

WB Powell - … challenges in complex, networked and risky …, 2016 - pubsonline.informs.org
Stochastic optimization, also known as optimization under uncertainty, is studied by over a
dozen communities, often (but not always) with different notational systems and styles …