Non-asymptotic analysis of monte carlo tree search

F Sun, Y Liu, JX Wang, H Sun - arXiv preprint arXiv:2205.13134, 2022 - arxiv.org

Nonlinear dynamics is ubiquitous in nature and commonly seen in various science and
engineering disciplines. Distilling analytical expressions that govern nonlinear dynamics …

被引用次数：55 相关文章所有 5 个版本

Online tree-based planning for active spacecraft fault estimation and collision avoidance

J Ragan, B Riviere, FY Hadaegh, SJ Chung - Science Robotics, 2024 - science.org

Autonomous robots operating in uncertain or hazardous environments subject to state safety
constraints must be able to identify and isolate faulty components in a time-optimal manner …

被引用次数：1 相关文章所有 3 个版本

[PDF] neurips.cc

Sample efficient reinforcement learning via low-rank matrix estimation

D Shah, D Song, Z Xu, Y Yang - Advances in Neural …, 2020 - proceedings.neurips.cc

We consider the question of learning $ Q $-function in a sample efficient manner for
reinforcement learning with continuous state and action spaces under a generative model. If …

被引用次数：47 相关文章所有 6 个版本

[PDF] science.org

Monte Carlo tree search with spectral expansion for planning with dynamical systems

B Rivière, J Lathrop, SJ Chung - Science Robotics, 2024 - science.org

The ability of a robot to plan complex behaviors with real-time computation, rather than
adhering to predesigned or offline-learned routines, alleviates the need for specialized …

Scalable Online planning for multi-agent MDPs

S Choudhury, JK Gupta, P Morales… - Journal of Artificial …, 2022 - jair.org

We present a scalable tree search planning algorithm for large multi-agent sequential
decision problems that require dynamic collaboration. Teams of agents need to coordinate …

被引用次数：16 相关文章所有 8 个版本

[PDF] jair.org

Optimality guarantees for particle belief approximation of POMDPs

MH Lim, TJ Becker, MJ Kochenderfer, CJ Tomlin… - Journal of Artificial …, 2023 - jair.org

Partially observable Markov decision processes (POMDPs) provide a flexible representation
for real-world decision and control problems. However, POMDPs are notoriously difficult to …

被引用次数：12 相关文章所有 7 个版本

[PDF] nasa.gov

Monte carlo tree search methods for the earth-observing satellite scheduling problem

AP Herrmann, H Schaub - Journal of Aerospace Information Systems, 2022 - arc.aiaa.org

This work explores on-board planning for the single spacecraft, multiple ground station Earth-
observing satellite scheduling problem through artificial neural network function …

被引用次数：20 相关文章所有 4 个版本

[PDF] mlr.press

On the convergence of policy iteration-based reinforcement learning with monte carlo policy evaluation

A Winnicki, R Srikant - International Conference on Artificial …, 2023 - proceedings.mlr.press

A common technique in reinforcement learning is to evaluate the value function from Monte
Carlo simulations of a given policy, and use the estimated value function to obtain a new …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Can large language models play games? a case study of a self-play approach

H Guo, Z Liu, Y Zhang, Z Wang - arXiv preprint arXiv:2403.05632, 2024 - arxiv.org

Large Language Models (LLMs) harness extensive data from the Internet, storing a broad
spectrum of prior knowledge. While LLMs have proven beneficial as decision-making aids …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Neural tree expansion for multi-robot planning in non-cooperative environments

B Riviere, W Hönig, M Anderson… - IEEE Robotics and …, 2021 - ieeexplore.ieee.org

We present a self-improving, Neural Tree Expansion (NTE) method for multi-robot online
planning in non-cooperative environments, where each robot attempts to maximize its …

被引用次数：18 相关文章所有 6 个版本