After months of speculation and excitement, OpenAI has officially launched the production version of its advanced reasoning model, now called “o1.” In addition, a “mini” version has been introduced, similar to GPT-4o, promising quicker and more responsive interactions while utilizing a smaller knowledge base.
The o1 model features a range of technical improvements. This is the first of OpenAI's reasoning models designed to emulate humanlike deduction, allowing it to tackle complex questions across various subjects—such as science, coding, and math—more swiftly than humans.
For instance, during tests, o1 was presented with a qualifying exam for the International Mathematics Olympiad. While GPT-4o achieved a correct rate of only 13%, o1 excelled with an impressive accuracy of 83%. In an online Codeforces competition, o1 ranked in the 89th percentile. Additionally, it can address questions that puzzled earlier models, such as determining which number is larger between 9.11 and 9.9. However, OpenAI clarifies that this launch is merely a glimpse into the model's complete potential.
The new o1 “has been developed using a unique optimization algorithm and an innovative training dataset crafted specifically for it,” explained Jerry Tworek, OpenAI’s research lead. By employing a combination of reinforcement learning and “chain of thought” reasoning, o1 reportedly generates more precise inferences than its predecessor. “We have observed that this model experiences fewer hallucinations,” Tworek noted, although he cautioned, “we can’t claim we’ve entirely eliminated hallucinations.”
Starting today, both ChatGPT-Plus and Teams subscribers can explore o1 and o1-mini. Enterprise and Edu subscribers are expected to gain access by next week. OpenAI anticipates that o1-mini will eventually be available to free-tier users, but a specific timeline has not been provided. Developers should be aware that the API pricing for o1 has significantly increased compared to GPT-4o. Access to o1 will cost $15 per million input tokens (in contrast to GPT-4o’s $5 per million) and $60 per million output tokens, which is four times the cost of 4o’s $5 per million charge. A curious question remains: how many R’s does the new model believe are in the word “strawberry”?