Reward Model and RLHF
课程大纲
推荐阅读材料
- [论文]Training language models to follow instructions with human feedback
- [论文]Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- [论文]BARTScore: Evaluating Generated Text as Text Generation
- [论文]GPTScore: Evaluate as You Desire
- [论文]Can Large Language Models be Trusted for Evaluation? Scalable Meta-Evaluation of LLMs as Evaluators via Agent Debate
致谢
- 感谢 Yixiu Liu 协助一起完成指令学习的课件。