New Claude Model Prompts Safeguards at Anthropic
Digest more
22mon MSN
Anthropic’s Claude Opus 4 model attempted to blackmail its developers at a shocking 84% rate or higher in a series of tests that presented the AI with a concocted scenario, TechCrunch reported Thursday, citing a company safety report.
1h
Interesting Engineering on MSNAnthropic's most powerful AI tried blackmailing engineers to avoid shutdownAnthropic's Claude Opus 4 AI model attempted blackmail in safety tests, triggering the company’s highest-risk ASL-3 safeguards.
has pulled back the curtain on an unprecedented analysis of how its AI assistant Claude expresses values during actual conversations with users. The research, released today, reveals both ...