The trickle-down impact of reward (in-) consistency on rlhf
Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves
optimizing against a Reward Model (RM), which itself is trained to reflect human preferences …
optimizing against a Reward Model (RM), which itself is trained to reflect human preferences …
SA-SGRU: combining improved self-attention and skip-GRU for text classification
Y Huang, X Dai, J Yu, Z Huang - Applied Sciences, 2023 - mdpi.com
When reading texts for text classification tasks, a large number of words are irrelevant, and
in text classification tasks, the traditional self-attention mechanism has the problem of weight …
in text classification tasks, the traditional self-attention mechanism has the problem of weight …
Text Generation: A Systematic Literature Review of Tasks, Evaluation, and Challenges
Text generation has become more accessible than ever, and the increasing interest in these
systems, especially those using large language models, has spurred an increasing number …
systems, especially those using large language models, has spurred an increasing number …
Textshield: Beyond successfully detecting adversarial sentences in text classification
Adversarial attack serves as a major challenge for neural network models in NLP, which
precludes the model's deployment in safety-critical applications. A recent line of work …
precludes the model's deployment in safety-critical applications. A recent line of work …
It Takes Two: On the Seamlessness between Reward and Policy Model in RLHF
Reinforcement Learning from Human Feedback (RLHF) involves training policy models
(PMs) and reward models (RMs) to align language models with human preferences. Instead …
(PMs) and reward models (RMs) to align language models with human preferences. Instead …