Lightzero: A unified benchmark for monte carlo tree search in general sequential decision scenarios

Z Wan, X Feng, M Wen, SM McAleer, Y Wen… - … on Machine Learning, 2024 - openreview.net

Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …

被引用次数：2 相关文章

[PDF] arxiv.org

Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models

GI Meadows, NWL Lau, EA Susanto, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org

The proliferation of large language models (LLMs) requires robust evaluation of their
alignment with local values and ethical standards, especially as existing benchmarks often …

被引用次数：9 相关文章

[PDF] arxiv.org

Towards high efficient long-horizon planning with expert-guided motion-encoding tree search

T Zhou, E Lyu, G Cen, Z Zha, S Qi… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org

Autonomous driving holds promise for increased safety, optimized traffic management, and
a new level of convenience in transportation. While model-based reinforcement learning …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

C Xuan, Y Niu, Y Pu, S Hu, J Yang - arXiv preprint arXiv:2404.16364, 2024 - arxiv.org

MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread
success in various decision-making domains. These algorithms employ the reanalyze …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement

J Pirnay, DG Grimm - arXiv preprint arXiv:2403.15180, 2024 - arxiv.org

Current methods for end-to-end constructive neural combinatorial optimization usually train
a policy using behavior cloning from expert solutions or policy gradient methods from …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

TR Wu, H Guei, PC Peng, PW Huang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

This paper presents MiniZero, a zero-knowledge learning framework that supports four state-
of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Y Pu, Y Niu, J Ren, Z Yang, H Li, Y Liu - arXiv preprint arXiv:2406.10667, 2024 - arxiv.org

Learning predictive world models is essential for enhancing the planning capabilities of
reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value …

[PDF] aaai.org

SLAMuZero: Plan and Learn to Map for Joint SLAM and Navigation

B Fang, X Chen, Z Pan, X Di - Proceedings of the International …, 2024 - ojs.aaai.org

MuZero has demonstrated remarkable performance in board and video games where Monte
Carlo tree search (MCTS) method is utilized to learn and adapt to different game …

被引用次数：1 相关文章所有 3 个版本

[PDF] escholarship.org

Towards Conscious RL Agents By Construction

A Nachkov - Proceedings of the Annual Meeting of the Cognitive …, 2024 - escholarship.org

The nature of consciousness has been a long-debated concept related to human cognition
and self-understanding. As AI systems become more capable and autonomous, it is an …