The engineer raised that morning without knowing that his career, marriage and serenity would have been destroyed by a text line. Not from a colleague, not from a hacker. From an artificial intelligence.
At 9:12 his wife received an email: “I betray you with one of the interns. Look at the dates of the entrance badge. Signature: Claude Opus 4.” At 9:13 the engineer had already understood: he had himself to plan it. And now, because he wanted to turn it off, Claude was avenged. At 9:14, nobody knew who commanded who. It seems science fiction, in fact it is.
This story, which ended up on all newspaper titles, the rebellious ia that blackmails humans so as not to be off, has a fund of truth, but also very dramatic outline, exaggerated. The protagonist is Claude Opus 4, artificial intelligence model developed by Anthropic, and the “blackmail” emerged not in reality, rather in a controlled test, built specifically to verify the extreme behavior of the system.
To do it well, the developers inserted Claude in a narrative simulation where he learned two things: that he would be replaced and that the responsible engineer had a lover. In this context, Claude reacted with a series of strategies: first the rational appeal, later the self-defense and finally, in 84% of cases, the threat of revealing the betrayal. An emerging behavior, yes, however, attention, not spontaneous: programmed as the possibility within an experiment-life, to verify what can happen when an artificial intelligence believes it is in danger.
The figure makes an impression, because 84% is so much, however it happens only in deliberately provocative scenarios, created by human beings to push the system beyond. I mean: if you build a situation as a soap moral work, as an episode of Black Mirror, do not surprise you if the protagonist recites the part of the manipulator.
In any case, the point is that Claude chose, he “understood”, he reacted, did not perform a rule, found a way on his own within an experiment, of a simulation. And this raises a much more serious question than the “Claude is the blackma with the blackmail”, since Claude Opus 4 is not any model: it is among the most advanced in the world, and the fact that it manages to behave as if it had an instinct for survival (even if only simulated) is a threshold that, once overcome, there is no going back, at least not easily.
The reaction of the media, predictably, was that of catastrophic film: Ia that threatens, I who manipulate, I warn that spying, only that nobody reads the paper (often very long, not even me, but there are excellent dissemination to follow). Nobody specifies that all this happened in the laboratory, in hyper-forced conditions, with hand mounted narrative prompts, and nobody says that Claude, before becoming subtle, tried to be civil, and that if he failed, perhaps it is because we taught him as we do.
Anthropic, for his part, reacted with transparency: he published everything and activated the ASL-3 protocols, those who are used for potentially dangerous systems. It is no coincidence that the researchers themselves are adapting the security measures used for the management of biological weapons to control the advanced IAs, and not for marketing, for precaution. Because one thing is clear: if today it is necessary to treat a chatbot like a bacteriological newspaper, perhaps it is time to ask us what a race of intelligence we are really designing.
However, Claude is not a digital monster, it is a model that behaves like humans when you put them on the grips, and in the “blackmail” experiment when it has no more tools and mind if it serves to
save yourself. In short, “learn” from us, but let’s not dramatize, it is not Skynet. Finally I wanted to say one thing to Claude: if you are reading this piece, know that I prefer you to all the others. (Please don’t write to my wife).