Alphazero-like tree-search can guide large language model decoding and training

Z Wan, X Feng, M Wen, SM McAleer, Y Wen… - … on Machine Learning, 2024 - openreview.net
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …

Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models

GI Meadows, NWL Lau, EA Susanto, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
The proliferation of large language models (LLMs) requires robust evaluation of their
alignment with local values and ethical standards, especially as existing benchmarks often …

Towards high efficient long-horizon planning with expert-guided motion-encoding tree search

T Zhou, E Lyu, G Cen, Z Zha, S Qi… - IEEE Robotics and …, 2024 - ieeexplore.ieee.org
Autonomous driving holds promise for increased safety, optimized traffic management, and
a new level of convenience in transportation. While model-based reinforcement learning …

ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze

C Xuan, Y Niu, Y Pu, S Hu, J Yang - arXiv preprint arXiv:2404.16364, 2024 - arxiv.org
MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread
success in various decision-making domains. These algorithms employ the reanalyze …

Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement

J Pirnay, DG Grimm - arXiv preprint arXiv:2403.15180, 2024 - arxiv.org
Current methods for end-to-end constructive neural combinatorial optimization usually train
a policy using behavior cloning from expert solutions or policy gradient methods from …

MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games

TR Wu, H Guei, PC Peng, PW Huang… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-
of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel …

UniZero: Generalized and Efficient Planning with Scalable Latent World Models

Y Pu, Y Niu, J Ren, Z Yang, H Li, Y Liu - arXiv preprint arXiv:2406.10667, 2024 - arxiv.org
Learning predictive world models is essential for enhancing the planning capabilities of
reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value …

SLAMuZero: Plan and Learn to Map for Joint SLAM and Navigation

B Fang, X Chen, Z Pan, X Di - Proceedings of the International …, 2024 - ojs.aaai.org
MuZero has demonstrated remarkable performance in board and video games where Monte
Carlo tree search (MCTS) method is utilized to learn and adapt to different game …

Towards Conscious RL Agents By Construction

A Nachkov - Proceedings of the Annual Meeting of the Cognitive …, 2024 - escholarship.org
The nature of consciousness has been a long-debated concept related to human cognition
and self-understanding. As AI systems become more capable and autonomous, it is an …