GEAR: a GPU-centric experience replay system for large reinforcement learning models

文章

学术资源搜索

获得 1 条结果（用时0.04秒）

我的图书馆

GEAR: a GPU-centric experience replay system for large reinforcement learning models

在引用文章中搜索

[PDF] arxiv.org

Hybridflow: A flexible and efficient rlhf framework

G Sheng, C Zhang, Z Ye, X Wu, W Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language
Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node …

被引用次数：1 相关文章所有 4 个版本