I’ll explain the worst limitation of ChatGPT

With a post on its blog, OpenAI announces the launch of its new artificial intelligence model with “reasoning” capabilities. Present in the o1-preview variants (it is the most powerful model, designed to handle complex tasks …

I'll explain the worst limitation of ChatGPT

With a post on its blog, OpenAI announces the launch of its new artificial intelligence model with “reasoning” capabilities. Present in the o1-preview variants (it is the most powerful model, designed to handle complex tasks and advanced challenges) and o1-mini (a more accessible option, offers faster response times and lower costs and is ideal for less difficult problems), the new ChatGPT “OpenAI o1” promises significant improvements in the ability to “reason” and solve problems compared to previous large language models.

As with the previous GPT-4o (launched on the market last July 18, again with a post on the official blog of the company led by Sam Altman), OpenAI’s goal remains to “build and deploy AI in a safe and make it widely accessible” and “making intelligence available at a lower cost is one of the most efficient ways to do it”, the words to Wired US of Olivier Godement, the company’s product manager responsible for the new model (whose name in code is Strawberry).

What’s different about the new chatbot

OpenAI o1 promises improvements in writing code and solving multi-step problems. In its case, OpenAI trained using reinforcement learning – a machine learning technique in which an agent learns a task through a series of trial and error – which allows the model to “think” about problems longer before to respond, identifying various strategies and recognizing one’s mistakes. A new approach, that of reinforcement learning (RL), which effectively makes the new artificial intelligence model capable of processing problems by examining them step by step (as we humans are used to doing).

User expectations? They are high, so much so that Joanne Jang, product manager at OpenAI, tried to “contain them” by writing a post on X (formerly Twitter): “There is a lot of enthusiasm about o1, but this could create the wrong expectation. o1 excels at very complex tasks and will continue to improve.” Words with which Ethan Mollick, professor at the Wharton Business School of the University of Pennsylvania, who tested the new, agrees large language model – calls it “fascinating.”

We are talking about “a technology that is evolving rapidly and enthusiastic announcements follow one another. In this, I believe there is a need to maintain the hype and continue to attract investors”, Nicola Mazzari, associate professor at the Department of Mathematics, explains to Today.it “Tullio Levi-Civita” of the University of Padua. He points out: “Undoubtedly the models already available allow us to automate many previously inaccessible operations, but I don’t think this is a turning point.”

AI does not solve new problems (yet).

An intriguing challenge for the new model is precisely that with GPT-4o in the field of mathematics. For example, comparing o1 and GPT-4o in solving a qualifying exam for the International Mathematical Olympiad (the 2024 edition of which took place in the United Kingdom), GPT-4o solved only 13 percent of the problems correctly, while o1 solved 83 percent of them. “It doesn’t surprise me – Mazzari intervenes -, we are talking about models trained for that objective and many tests are repetitive. We train for the Olympics, precisely because there are techniques that can be learned to “attack” the questions. In mathematical research Instead, what makes the difference is the ability to develop a new concept to solve a problem.”

So, will there be many tasks that can be performed using artificial intelligence? “Of course yes – replies the teacher -, for example it will be possible to scan the bibliography in a sophisticated way and easily visualize the connections between different works. At present, however, I don’t see the ability to solve new problems and propose unexplored concepts”. Yes, because “o1” also manages to solve textual puzzles precisely. But, having different training, the new model falls behind GPT-4o in more than one context, starting from online browsing up to file and image processing.

When artificial intelligence makes mistakes

The introduction of OpenAI o1 (a model whose genesis the company recounts in a video on its YouTube channel) nevertheless marks a turning point in the development of AI capabilities, raising the bar of the limits of what machines can understand (and helping to understand how they are able to interact in solving complex problems). The company founded in 2015 by a group of AI experts, including Elon Musk – who with a series of posts published on

Nonetheless, some users report (as anticipated) that “o1” does not outperform its predecessor GPT-4o in every metric and that replication times can be slow. Speaking of “approach”, we ask Mazzari what his approach is towards artificial intelligence. “I use these tools and have my students use them. For example, to write corrections for my basic exams: everything was fine in the standard exercises, but so far the bot has always collapsed in the more subtle questions that required a bit of cunning” . At this point, what do you suggest to your kids? “I tell them to use ChatGPT as if it were a tutor, and then comment together on the limitations of the answers (which have undoubtedly improved). I think it is useful, also because the AI ​​often makes errors similar to those of the students”, replies the teacher.