OpenAI Has Launched World’s First Reasoning Based AI Model O1

OpenAI is launching a new model called o1, marking the first in a series of “reasoning” models designed to tackle more complex questions faster than a human.

Along with this, they are introducing o1-mini, a smaller and more affordable version. For those following AI rumors, this is indeed the much-anticipated Strawberry model.

The release of o1 is a significant move for OpenAI as it aligns with their broader mission of developing human-like artificial intelligence.

OpenAI Has Launched World's First Reasoning Based AI Model O1

On a practical level, the model excels in tasks like writing code and solving multistep problems more effectively than its predecessors.

However, o1 comes with trade-offs—it is more expensive and slower than the GPT-4o model. OpenAI is framing this release as a “preview” to underscore its early-stage nature.

Starting today, ChatGPT Plus and Team users will gain access to both o1-preview and o1-mini. Meanwhile, Enterprise and Edu users can expect to have access early next week. OpenAI also intends to eventually make o1-mini available to all free users of ChatGPT, although no release date has been announced.

Access to the o1 model comes at a steep price for developers. In the API, o1-preview costs $15 per 1 million input tokens (the chunks of text processed by the model) and $60 per 1 million output tokens. For comparison, GPT-4o is significantly cheaper, with input tokens priced at $5 per 1 million and output tokens at $15 per 1 million.

According to Jerry Tworek, OpenAI’s research lead, the training process for o1 differs significantly from that of its earlier models.

Although OpenAI is keeping many details under wraps, Tworek mentioned that o1 was developed using “a completely new optimization algorithm” and a specialized training dataset tailored specifically for this model.

OpenAI previously trained GPT models to replicate patterns from their training data. For o1, the approach has shifted to solving problems independently through a method called reinforcement learning, which uses rewards and penalties. The model processes queries using a “chain of thought,” similar to how humans solve problems step-by-step.

This new training approach is expected to improve accuracy. According to Tworek, “We have noticed that this model hallucinates less.” However, he notes that hallucinations are not completely resolved: “We can’t say we solved hallucinations.”

What distinguishes this new model from GPT-4o is its enhanced ability to handle complex problems, including coding and math, while also explaining its reasoning.

OpenAI’s chief research officer, Bob McGrew, mentions, “The model is definitely better at solving the AP math test than I am, and I was a math minor in college.”

He adds that OpenAI O1 was tested against a qualifying exam for the International Mathematics Olympiad, where GPT-4o only solved 13 percent of the problems correctly, whereas o1 achieved an 83 percent success rate.

In Codeforces online programming competitions, the new model ranked in the 89th percentile of participants. OpenAI also suggests that the next update of this model will perform similarly to PhD students on tough benchmark tasks in physics, chemistry, and biology.

However, o1 isn’t as strong as GPT-4o in several areas. It lacks proficiency in factual knowledge about the world and cannot browse the web or handle files and images. Despite this, OpenAI believes it introduces a new level of capabilities. The model’s name, o1, symbolizes “resetting the counter back to 1.”

McGrew admitted, “I’m gonna be honest: I think we’re terrible at naming, traditionally. So I hope this is the first step of newer, more sane names that better convey what we’re doing to the rest of the world.”

Though I didn’t get to test o1 myself, McGrew and Tworek demonstrated its capabilities to me over a video call this week, where they asked it to solve a puzzle.

“A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their present age. What is the age of prince and princess? Provide all solutions to that question.”

The model took about 30 seconds to buffer before providing the correct answer. OpenAI designed the interface to display the reasoning steps as the model processes information.

What stood out to me wasn’t just that it showed its work — GPT-4o can do this when prompted — but how o1 appeared to imitate human-like thinking. Phrases such as “I’m curious about,” “I’m thinking through,” and “Ok, let me see” gave the impression of a step-by-step thought process.

However, this model isn’t actually thinking, nor is it human. So, why design it to give it the appearance it is?

According to Tworek, OpenAI does not equate AI model thinking with human thought. Instead, the interface is designed to illustrate how the model invests more time in processing and delving into problem-solving. He notes, “There are ways in which it feels more human than previous models.”

McGrew adds, “You’ll find that in some aspects it feels quite alien, yet in others, it surprisingly mimics human behavior.”

The model is programmed with a time constraint for processing queries, which might lead it to express urgency, such as, “Oh, I’m running out of time, let me get to an answer quickly.”

At times, during its reasoning process, it may also appear to be brainstorming, with phrases like, “I could do this or that, what should I do?”

McGrew states, “We have spent many months focusing on reasoning because we believe this is the key breakthrough.” He explains that this represents a new modality for models, essential for tackling the complex problems necessary for advancing toward human-like intelligence.

Stories You May Like

Help Someone By Sharing This Article