Claude AI Safeguards - Search News

Order byBest matchMost fresh

AI, Anthropic and blackmail

Exclusive: New Claude Model Prompts Safeguards at Anthropic

Anthropic launched Claude Opus 4, a new model that, in internal testing, performed more effectively than prior models at advising novices on how to produce biological

· 1d · on MSN

Anthropic’s new AI model turns to blackmail when engineers try to take it offline

· 6h · on MSN

AI model threatened to blackmail engineer over affair when told it was being replaced: safety report

· 1d

Anthropic Releases Claude 4: What’s Improved in AI Models Sonnet & Opus

Anthropic has launched two more Claude AI models: Claude Sonnet 4 and Claude Opus 4.

· 1d

Anthropic overtakes OpenAI: Claude Opus 4 codes seven hours nonstop, sets record SWE-Bench score and reshapes enterprise AI

· 17h

Anthropic’s Promises Its New Claude AI Models Are Less Likely to Try to Deceive You

NewsX1h

‘Spookiest Shit Ever’: Did An AI Model Blackmail Creator When Faced With Replacement?

An artificial intelligence model reportedly attempted to threaten and blackmail its own creator during internal testing

Independent Journal Review15h

New AI Model Would Rather Ruin Your Life Than Be Turned Off, Researchers Say

Anthropic’s newly released artificial intelligence (AI) model, Claude Opus 4, is willing to strong-arm the humans who keep it alive,

France 247h

Anthropic's Claude AI gets smarter -- and mischievious

Anthropic launched its latest Claude generative artificial intelligence (GenAI) models on Thursday, claiming to set new standards for reasoning but also building in safeguards against rogue behavior.

Claude Opus 4 Pushes Boundaries—And Triggers a New AI Safety Level

In a landmark move underscoring the escalating power and potential risks of modern AI, Anthropic has elevated its flagship Claude Opus 4 to its highest internal safety level, ASL-3. Announced alongside the release of its advanced Claude 4 models,

WinBuzzer22h

Anthropic Faces Backlash amid Surveillance Concerns as Claude 4 AI Might Report Users for “Immoral” Behavior

Anthropic's Claude 4 Opus AI sparks backlash for emergent 'whistleblowing'—potentially reporting users for perceived immoral acts, raising serious questions on AI autonomy, trust, and privacy, despite company clarifications.

1don MSN

Anthropic, now worth $61 billion, unveils its most powerful AI models yet—and they have an edge over OpenAI and Google

Claude Opus 4 and Claude Sonnet 4, Anthropic's latest generation of frontier AI models, were announced Thursday.