OpenAI has unveiled its latest snack-sized generative model, GPT-4o mini, designed to be less resource-intensive and more cost-effective than the standard GPT-4o model. This enhancement allows developers to seamlessly incorporate AI technology into a broader array of products. Not only does this represent a significant upgrade for developers and applications, but it also enhances the capabilities of the free version of ChatGPT, minimizing limitations for users. GPT-4o mini is now available for users on the Free, Plus, and Team tiers through the ChatGPT web and app, with ChatGPT Enterprise subscribers set to gain access next week. Starting today, GPT-4o mini will replace the existing small model, GPT-3.5 Turbo, for end users.
While the older model remains accessible to developers through the API for those who prefer to hold off on transitioning to GPT-4o mini, OpenAI plans to retire the previous model in the future, although no specific timeline has been established.
Since May, GPT-4o has been available to free ChatGPT accounts, albeit with certain limitations due to high demand. The updated FAQ clarifies that while GPT-4o still faces these restrictions, users will now automatically switch to GPT-4o mini instead of GPT-3.5 upon reaching their limits. This change is beneficial for users who have not upgraded to ChatGPT Plus.
With the introduction of GPT-4o mini, OpenAI aims to enhance AI accessibility for all users, now available in the API and rolling out in ChatGPT. According to data from Artificial Analysis, this new model achieved an impressive 82% on the MMLU reasoning benchmark, outperforming Gemini 1.5 Flash by 3% and Claude 3 Haiku by 7%. For context, the current MMLU benchmark record is held by Gemini Ultra, Google’s leading AI, with a score of 90%.
Importantly, OpenAI reports that GPT-4o mini operates at 60% lower costs compared to GPT-3.5 Turbo. Developers will incur charges of 15 cents per million input tokens and 60 cents per million output tokens. OpenAI touts GPT-4o mini as “the most capable and cost-efficient small model available today” according to CNBC. The cost savings can be attributed to the fact that many tasks enhanced by AI do not require the full capabilities of a larger model like GPT, Claude, or Gemini. Employing a full-sized large language model (LLM) for simple, high-volume tasks can often be unnecessarily expensive and resource-intensive. This is where smaller LLMs, such as Google’s Gemini 1.5 Flash, Meta’s Llama 3 8b, or Anthropic’s Claude 3 Haiku, become advantageous, executing these tasks more swiftly and economically than their larger counterparts.
OpenAI further indicated that GPT-4o mini retains the same context window size of 128,000 tokens (approximately the length of a book) as the full-size model, with a knowledge cutoff of October 2023; however, the specific size of the new model has not been disclosed. The model API currently supports text and vision capabilities, with plans for video and audio functionality in the future. This announcement follows OpenAI's recent update regarding its highly anticipated Voice Mode, integrated with GPT-4o. The company revealed that a smaller alpha version is expected to launch in late July, with broader rollout anticipated for this fall.