New OpenAI Models: Brilliant at Reasoning, Yet Prone to More Hallucinations

Although artificial intelligence has made huge strides, a big problem still lingers with OpenAI models called hallucinations. These are moments when the AI confidently spits out made-up or wrong info as if it’s true. It’s a serious issue that messes with how much people can trust AI. Even as OpenAI builds smarter tools, hallucinations keep popping up, especially in their newest models.

Despite AI’s progress, OpenAI models still struggle with hallucinations, confidently delivering false info and undermining trust in this advancing technology.

OpenAI’s older models, like o1 and o3-mini, had hallucination rates around 14.8% to 16%. That’s not great, but the newer reasoning models are worse. The o3 and o4-mini models hallucinate about 30% of the time or more. GPT-4.5, a super advanced model, gets it wrong 37% of the time in tests. Smaller, cheaper ones like o3-mini can even hit a crazy 80.3% hallucination rate. Another model, GPT-4o, isn’t much better with a rate near 61.8% in internal checks. Despite these challenges, performance in specific areas like coding and math has shown improvement despite higher hallucination rates.

Why’s this happening? Reasoning models, built to think deeper, often make more claims. That means they’re right more often, but they also make up stuff more too. The exact reasons for these spikes in newer models aren’t clear. OpenAI’s own notes say they need more research to figure it out. It might be tied to how complex these models are or how they’re trained to focus on reasoning and coding. Additionally, educators are concerned about students using AI tools, prompting the development of AI detection tools to ensure academic integrity.

There’s some hope though. Studies show that bigger, stronger AI models tend to hallucinate less over time. Experts predict a drop of about 3 percentage points each year. Some even think hallucinations could almost disappear by 2027 with next-gen models. But recent jumps in rates with new OpenAI tools challenge that idea. The future’s uncertain, and more work’s needed. Additionally, thoughtful UX design can help users question AI outputs and reduce misinformation risks.

Testing’s key to spotting these issues. Tools like Hugging Face’s Hallucination Leaderboard and OpenAI’s SimpleQA check how often models mess up. These benchmarks help compare different AIs and track progress. Public leaderboards let everyone see the results.

Hallucinations hurt trust in AI. When models make up facts, it can mislead users and mess up decisions. OpenAI’s still pushing to understand and fix this tricky problem in their latest tech.

New OpenAI Models: Brilliant at Reasoning, Yet Prone to More Hallucinations

Up next

AI Doll Craze Sparks Backlash From Traditional Artists

Author

JuicyAiNews Team

Tags

Share article

Motorola’s New Razr Phones Elevate AI Capabilities via Preinstalled Perplexity Assistant

Zuckerberg Predicts Artificial Intelligence Will Become Integral to Daily Human Experiences

China’s President Calls for Accelerated AI Innovation to Counter U.S. Dominance

Survey Reveals Experts Doubt Current AI Models Will Achieve General Intelligence Soon

Zuckerberg Predicts Artificial Intelligence Will Become Integral to Daily Human Experiences

Survey Reveals Experts Doubt Current AI Models Will Achieve General Intelligence Soon

AI Poised to Disrupt 38% of Professional Jobs by 2026, Experts Warn

Intelligent Systems and AI Bots Usher in a New Era of Online Retail

New OpenAI Models: Brilliant at Reasoning, Yet Prone to More Hallucinations

Up next

Author

JuicyAiNews Team

Tags

Share article

You May Also Like