Alignment Faking in Large Language ModelsRyan Greenblatt,Carson Denison, Benjamin Wright,Fabien Roger,Monte MacDiarmid, Sam Marks,Johannes Treutlein, Tim Belonax, Jack Chen,David Duvenaud,Akbir Khan,Julian Michael,Sören Mindermann,Ethan Perez, Linda Petrini,Jonathan Uesato,Jared Kaplan,Buck Shlegeris,Samuel R. Bowman,Evan HubingerCoRR(2024)Cited 0|Views32AI Read ScienceMust-Reading TreeExampleGenerate MRT to find the research sequence of this paperChat PaperSummary is being generated by the instructions you defined