Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMsAbhay Sheshadri,Aidan Ewart,Phillip Guo,Aengus Lynch, Cindy Wu, Vivek Hebbar,Henry Sleight,Asa Cooper Stickland,Ethan Perez,Dylan Hadfield-Menell,Stephen CasperCoRR(2024)引用 0|浏览18AI 理解论文溯源树样例生成溯源树,研究论文发展脉络Chat Paper正在生成论文摘要