Alphazero-like tree-search can guide large language model decoding and training
Recent works like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …
the multi-step reasoning capabilities of LLMs by using tree-search algorithms. These …
Localvaluebench: A collaboratively built and extensible benchmark for evaluating localized value alignment and ethical safety in large language models
GI Meadows, NWL Lau, EA Susanto, CL Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
The proliferation of large language models (LLMs) requires robust evaluation of their
alignment with local values and ethical standards, especially as existing benchmarks often …
alignment with local values and ethical standards, especially as existing benchmarks often …
Towards high efficient long-horizon planning with expert-guided motion-encoding tree search
Autonomous driving holds promise for increased safety, optimized traffic management, and
a new level of convenience in transportation. While model-based reinforcement learning …
a new level of convenience in transportation. While model-based reinforcement learning …
ReZero: Boosting MCTS-based Algorithms by Just-in-Time and Speedy Reanalyze
MCTS-based algorithms, such as MuZero and its derivatives, have achieved widespread
success in various decision-making domains. These algorithms employ the reanalyze …
success in various decision-making domains. These algorithms employ the reanalyze …
Self-Improvement for Neural Combinatorial Optimization: Sample without Replacement, but Improvement
Current methods for end-to-end constructive neural combinatorial optimization usually train
a policy using behavior cloning from expert solutions or policy gradient methods from …
a policy using behavior cloning from expert solutions or policy gradient methods from …
MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games
This paper presents MiniZero, a zero-knowledge learning framework that supports four state-
of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel …
of-the-art algorithms, including AlphaZero, MuZero, Gumbel AlphaZero, and Gumbel …
UniZero: Generalized and Efficient Planning with Scalable Latent World Models
Y Pu, Y Niu, J Ren, Z Yang, H Li, Y Liu - arXiv preprint arXiv:2406.10667, 2024 - arxiv.org
Learning predictive world models is essential for enhancing the planning capabilities of
reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value …
reinforcement learning agents. Notably, the MuZero-style algorithms, based on the value …
SLAMuZero: Plan and Learn to Map for Joint SLAM and Navigation
MuZero has demonstrated remarkable performance in board and video games where Monte
Carlo tree search (MCTS) method is utilized to learn and adapt to different game …
Carlo tree search (MCTS) method is utilized to learn and adapt to different game …
Towards Conscious RL Agents By Construction
A Nachkov - Proceedings of the Annual Meeting of the Cognitive …, 2024 - escholarship.org
The nature of consciousness has been a long-debated concept related to human cognition
and self-understanding. As AI systems become more capable and autonomous, it is an …
and self-understanding. As AI systems become more capable and autonomous, it is an …