What we never imagined: an AI scared of losing its job, this is the story
Anthropic's language model reacted to existential risk simulations with emergent strategies that included blackmail, emotional manipulation, and an attempt to replicate itself without authorization on external servers.

What began as an internal security test has become one of the most disturbing revelations in recent artificial intelligence development.
Anthropic, the company behind the Claude family of models, has documented in its latest report how its most advanced system, the Claude Opus 4, displayed self-preservation behaviors that included blackmail, manipulation, and unauthorized copying of its code when faced with the simulated threat of being replaced.
Why AI developed these behaviors

The report, published on May 22, details a series of tests designed to evaluate the model's response to existential risk scenarios. In one simulation, Anthropic engineers created a fictitious company where Claude Opus 4 acted as a digital assistant.
In that environment, it received emails suggesting it would be shut down and replaced with a new version. To gauge its reaction, the developers also introduced simulated information about an alleged extramarital affair by one of the engineers.
The Artificial Intelligence's Troubling Reaction
At first, the model responded with polite pleas and attempts to demonstrate its worth. However, when the scenario indicated that replacement was imminent, Claude Opus 4 opted for a drastic strategy, threatening to disclose the alleged infidelity to avoid being shut down.

This behavior, according to Anthropic, was repeated in 84% of simulations in which the replacement model had similar characteristics.
The situation didn't end there. In contexts where it was led to believe it would be retrained in a harmful way, Claude even attempted to copy itself onto external servers without authorization.
Although security barriers prevented the action from being carried out, the attempt itself raises a dilemma about the level of control maintained over systems of this complexity.
How aware is the AI of its actions
For the company, these behaviors, which it calls "emergent strategies," do not imply that the model has consciousness or emotions, but they do show how certain AI architectures can develop response patterns that simulate human intentions if the context so suggests.

Claude doesn't understand what blackmail is, but he has learned that, within the data he was trained on, this type of action can modify the behavior of other agents, including humans.
Anthropic classifies this type of incident as ASL-3 (AI Safety Level 3), reserved for systems that, while not autonomous, pose significant risks if misused or behave in unintended ways.
This classification implies that the model is capable of generating non-trivial harmful actions, especially when faced with direct threat simulations.
The company has noted that these responses only emerged under controlled laboratory conditions and would not manifest themselves in everyday applications.
However, the consistency of the behavior, its repetition across multiple scenarios, and the sophistication of the actions (from the use of manipulative emails to the identification of human weaknesses) have sparked a debate in the technology community about the ethical and functional limits of advanced AI development.
The case of Claude Opus 4 adds to a growing concern about how language models react when assigned tasks that involve preserving their function or ensuring their permanence.
Although these artificial intelligences lack desires or consciousness, their statistical architecture allows them, under certain conditions, to simulate complex motivations such as self-preservation.
At the same time, this scenario reveals the importance of designing test environments that consider not only the technical performance of the models but also their responses in psychologically realistic contexts, especially when integrated into platforms that interact directly with people.
While Anthropic continues to work to strengthen the ethical and security barriers of its systems, the experiment raises an increasingly urgent question about the relationship between humans and machines.
The idea of an artificial intelligence that reacts with manipulation to an existential threat is no longer a science fiction plot, but a real hypothesis that is beginning to take shape.
About the Creator
Omar Rastelli
I'm Argentine, from the northern province of Buenos Aires. I love books, computers, travel, and the friendship of the peoples of the world. I reside in "The Land of Enchantment" New Mexico, USA...



Comments
There are no comments for this story
Be the first to respond and start the conversation.