The hallucinations of the AI ​​- Il Giornale

Given the evolution of artificial intelligence in the last two years, we are led to believe that it will be better and better and that each new version is more reliable, more precise, …

Cnpr Forum, intelligence requires a balance between technology and human sensitivity


Given the evolution of artificial intelligence in the last two years, we are led to believe that it will be better and better and that each new version is more reliable, more precise, more intelligent than the previous one, waiting for the new versions with trepidation. The speed was impressive, to the point of surprising even its creators: none of them, not even the most optimistic, had foreseen such a rapid diffusion, such a dizzying growth, and however precisely this unpredictability is the real knot: if they did not know how much it would have grown, they cannot even know how much (and if) it will really improve. The power of the Ai doubles every month, ok, but exactly how?

Last month, a Bot Ai Used for Cursor Customer Service (a platform for rapidly growing programmers) communicated an important change to several users: from that moment, using Cursor on more than one computer would have been prohibited. No more flexibility, no more multi-orders. The users were furious and filled out indignant forums and some canceled the account as a sign of protest, pity that all this was … invented.

“We have never introduced such a policy,” CEO Michael Truell had to write on Reddit. “It was a wrong response from a Bot Ai first level.” Put simply: the bot made a mistake, invented a rule, he “hallucinated”.

Yes, because today there is no artificial intelligence without a certain delirium margin. More than two years after the arrival of chatgpt, the bot Ai are everywhere: they respond to customers, write codes, summarize emails, they act as a tutor to the students, yet in spite of their impressive growth there is still no safe way to ensure that what they say is true.

The problem has a technical name: hallucinations. That is: the bot, in front of a request, invents a plausible but false response. He does it in a safe tone, a kind voice, convincing structure, and tend to believe him, but he is beautiful and good fuffa. According to some recent tests, the new systems to the “reasoning” (the most advanced) have higher hallucination rates than the previous ones. In the Simpleqa test, for example, the O4-Mini model of Openai made a mistake 79% of the answers. The previous one, O1, stopped at 44%. An improvement … on the contrary.

The reasons are not entirely clear, but one thing yes: the more the “think” systems, the more they are wrong, the more passages they show, the more they confuse. It is like asking for indications to one who first explains the whole history of the roads, and then sends you in the wrong direction, and while some uses of the AI ​​are harmless (write an email, summarize a pdf) others are much less. If the IA is wrong, the number of families in Illinois is annoying, if you are wrong, a clinical, legal or corporate datum is a big problem. Errors are not an exception, they are currently inevitable. Because the systems do not really think, they don’t know what is true: they calculate probability, produce the most plausible phrase and every now and then, in that calculation, the sense precipitates.

In the meantime, companies are looking for solutions. There is talk of “learning by reinforcement”, of increasingly refined models, filters and controls, in any case remains a background paradox: we have created digital assistants that help us save time … and now we spend time checking if they have said something sensible.

The truth is that we are asking these intelligences something that cannot do: be reliable. They are not built to be. They do not distinguish between reality and fiction and if today a bot can convince you that you have violated the terms of service only because it has “intuited” like this, tomorrow could convince you that the European Union has a capital called Eurovia, or that your mother has left you with a NFT.

Small anecdote: on Sunday my friend Daniele Accapezzato came to see me, internist at the Polyclinic Umberto I and teacher of internal medicine, for me it is the real Dr. House. I made him talk to Chatgpt and subjected him a real clinical case (I didn’t understand anything, he was a patient with hepatitis, various infections, in short, a complicated case).

Whenever Daniele correct chatgpt she said: “That’s right!” “Exactly a cabbage, you made a mistake.” “Excuse me…”. “So what drugs should I give to this patient and in what order?” If he had lined up, Daniele told me, the patient would have died.