OpenAI has recently made waves with the launch of a new model called o3, which is being hailed as their "strongest champion" to date, earning a reputation in the AI world like never before.
What makes the o3 model so remarkable? It has mastered a unique "trick" known as "computation during testing". To put it simply, it works like a cautious overachiever who, when faced with a problem, doesn’t rush to provide an answer but takes extra time to "ponder" and explore all possible solutions. The model is designed to minimize the risk of delivering a wrong answer—it's methodical and thorough, like a student who double-checks every detail before submitting an exam paper. OpenAI engineers are hopeful that this clever model will be able to handle even the most complex prompts with ease, providing users with responses that meet or exceed their expectations.
Sure enough, under the most challenging "high-computation mode" used to test AI models, o3 passed the "ARC-AGI benchmark" with flying colors, scoring an impressive 87.5%. This performance left previous models in the dust. Just three months earlier, OpenAI's o1 model had only managed a score of 32%, which means o3's score is nearly three times higher—a major leap forward in AI capability. This leap is so dramatic that o3 is being compared to a "genius" who leaves all other models far behind.
However, as the saying goes, “you get what you pay for.” The remarkable performance of the o3 model comes at a hefty price. The computational costs associated with o3 are astronomical—each task costs over $1,000, which is 170 times more expensive than its lower-power version! Compared to earlier models, which cost less than $4 per task, o3's cost is like comparing the sky to the ground. Even the relatively "affordable" low-computation version of o3 still costs around $20 per task in the benchmark test, which is significantly higher than previous versions. When you compare that to ChatGPT Plus, which charges users only $25 per month for access to a range of services, the cost disadvantage of o3 becomes even more pronounced. This raises a critical question: how much smarter can user-facing products get without driving OpenAI into the red? It's a tricky problem that will need careful balancing.
Looking ahead, the future of o3 presents both challenges and opportunities. In a blog post explaining the benchmark results, François Chollet, the "mastermind" behind the tests, was candid about the model's strengths and weaknesses. While he acknowledged that o3’s performance is now approaching human-level capabilities—truly impressive—the costs are simply too high. He likened o3 to a high-performance sports car: incredibly powerful but expensive to run. In its current form, it doesn’t meet the "economically viable" standard for widespread use. Chollet humorously pointed out that hiring a person to solve the ARC-AGI tasks would cost around $5 per task, with energy consumption of only a few cents—truly "affordable and effective."
Despite these concerns, Chollet expressed confidence that, within the next few months or years, the cost-to-performance ratio of o3 will improve drastically, similar to how rapidly technological advancements can lead to price reductions. OpenAI plans to release a "mini" version of o3 in January, offering a "taste test" to the public before the full model is unveiled. This seems like a clever strategy to generate interest and prepare for future developments, while also addressing the challenges associated with high operational costs.
The launch of o3 represents both a breakthrough in AI performance and a cautionary tale about the high costs of cutting-edge technology. While the model’s capabilities are a significant leap forward, its expense highlights the ongoing challenge of making advanced AI systems not only powerful but also economically viable. OpenAI’s strategy will likely involve scaling these models over time, driving down costs while improving efficiency. For now, the company is balancing the excitement of innovation with the reality of financial sustainability, a task that will likely define its future trajectory in the AI space.