订阅小程序
旧版功能

Constitutional Classifiers: Defending Against Universal Jailbreaks Across Thousands of Hours of Red Teaming

Mrinank Sharma,Meg Tong,Jesse Mu, Jerry Wei, Jorrit Kruthoff, Scott Goodfriend, Euan Ong, Alwin Peng, Raj Agarwal,Cem Anil,Amanda Askell, Nathan Bailey,Joe Benton,Emma Bluemke,Samuel R. Bowman, Eric Christiansen, Hoagy Cunningham, Andy Dau, Anjali Gopal, Rob Gilson, Logan Graham, Logan Howard, Nimit Kalra, Taesung Lee, Kevin Lin, Peter Lofgren, Francesco Mosconi, Clare O'Hara,Catherine Olsson, Linda Petrini, Samir Rajani, Nikhil Saxena, Alex Silverstein, Tanya Singh, Theodore Sumers, Leonard Tang, Kevin K. Troy, Constantin Weisser, Ruiqi Zhong, Giulio Zhou,Jan Leike,Jared Kaplan,Ethan Perez

CoRR(2025)

引用 0|浏览6
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要