New OpenAI Models: Brilliant at Reasoning, Yet Prone to More Hallucinations

Although artificial intelligence has made huge strides, a big problem still lingers with OpenAI models called hallucinations. These are moments when the AI confidently spits out made-up or wrong info as if it’s true. It’s a serious issue that messes with how much people can trust AI. Even as OpenAI builds smarter tools, hallucinations keep popping up, especially in their newest models.

Despite AI’s progress, OpenAI models still struggle with hallucinations, confidently delivering false info and undermining trust in this advancing technology.

OpenAI’s older models, like o1 and o3-mini, had hallucination rates around 14.8% to 16%. That’s not great, but the newer reasoning models are worse. The o3 and o4-mini models hallucinate about 30% of the time or more. GPT-4.5, a super advanced model, gets it wrong 37% of the time in tests. Smaller, cheaper ones like o3-mini can even hit a crazy 80.3% hallucination rate. Another model, GPT-4o, isn’t much better with a rate near 61.8% in internal checks. Despite these challenges, performance in specific areas like coding and math has shown improvement despite higher hallucination rates.

Why’s this happening? Reasoning models, built to think deeper, often make more claims. That means they’re right more often, but they also make up stuff more too. The exact reasons for these spikes in newer models aren’t clear. OpenAI’s own notes say they need more research to figure it out. It might be tied to how complex these models are or how they’re trained to focus on reasoning and coding. Additionally, educators are concerned about students using AI tools, prompting the development of AI detection tools to ensure academic integrity.

There’s some hope though. Studies show that bigger, stronger AI models tend to hallucinate less over time. Experts predict a drop of about 3 percentage points each year. Some even think hallucinations could almost disappear by 2027 with next-gen models. But recent jumps in rates with new OpenAI tools challenge that idea. The future’s uncertain, and more work’s needed. Additionally, thoughtful UX design can help users question AI outputs and reduce misinformation risks.

Testing’s key to spotting these issues. Tools like Hugging Face’s Hallucination Leaderboard and OpenAI’s SimpleQA check how often models mess up. These benchmarks help compare different AIs and track progress. Public leaderboards let everyone see the results.

Hallucinations hurt trust in AI. When models make up facts, it can mislead users and mess up decisions. OpenAI’s still pushing to understand and fix this tricky problem in their latest tech.