Hybridflow: A flexible and efficient rlhf framework
Reinforcement Learning from Human Feedback (RLHF) is widely used in Large Language
Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node …
Model (LLM) alignment. Traditional RL can be modeled as a dataflow, where each node …