Secrets of RLHF in Large Language Models Part I: PPO
Rui Zheng,Shihan Dou,Songyang Gao,Wei Shen,Binghai Wang,Yan Liu,Senjie Jin,Qin Liu,Limao Xiong,Lu Chen,Zhiheng Xi,Yuhao Zhou,Nuo Xu,Wenbin Lai, Minghao Zhu,Rongxiang Weng,Wensen Cheng,Cheng Chang,Zhangyue Yin,Yuan Hua,Haoran Huang,Tianxiang Sun,Hang Yan,Tao Gui,Qi Zhang,Xipeng Qiu,Xuanjing Huang CoRR(2023)
AI 理解论文
溯源树
样例
