A little over two months ago, OpenAI unveiled GPT-4o, its most advanced AI model designed for multimodal inputs and outputs, including text, images, audio, and eventually video. This marked a significant leap as it did not rely on other models for support.
Upon release, GPT-4o was the most powerful publicly available AI model, according to third-party benchmarks. However, it was quickly surpassed by Anthropic’s Claude 3.5 Sonnet, and both models have been in close competition since.
Introducing GPT-4o Mini
In response, OpenAI has announced GPT-4o mini, which it claims is “the most cost-efficient small model on the market,” priced at just $0.15 USD for every 1 million input tokens and $0.60 for 1 million output tokens through OpenAI’s APIs. In comparison, GPT-4o costs $5.00 for 1 million input tokens and $15.00 for outputs.
Tokens are numerical codes representing words, numbers, and other data within language models. OpenAI has not specified the number of parameters for GPT-4o mini, but its name suggests it is a smaller model.
OpenAI’s Head of Product, Olivier Godement, mentioned in a recent media teleconference that GPT-4o mini is particularly beneficial for enterprises and developers creating various AI agents, from customer support to financial applications. Many of these tasks require high token volumes, making GPT-4o mini’s cost-effectiveness appealing.
“The cost per intelligence is remarkable,” stated Godement. “I expect extensive adoption for customer support, software engineering, creative writing, and more tasks.” He also noted that every new model generates unique use cases, particularly for GPT-4o mini.
This launch comes ahead of Meta’s anticipated release of the Llama 3, a 400-billion parameter model, and seems strategically timed to reinforce OpenAI's position as a leader in enterprise AI.
Cost Comparison and Performance
GPT-4o mini is 60% less expensive than GPT-3.5 Turbo, which was previously the most affordable model among OpenAI's offerings. It is designed to match GPT-3.5 Turbo in speed, processing approximately 67 tokens per second.
Positioned as a successor to GPT-3.5 Turbo, GPT-4o mini will not only handle text but also vision inputs, unlike its predecessor. Future updates will enable it to generate images, audio, and video, although today it supports only text and still images/documents.
Benchmarks indicate that GPT-4o mini outperforms GPT-3.5 Turbo, as well as comparable models like Google’s Gemini 1.5 Flash and Anthropic’s Claude 3 Haiku, excelling on tasks such as the Massive Multitask Language Understanding (MMLU) test, where it scored 82.0%, compared to 77.9% for Gemini Flash and 73.8% for Claude Haiku.
Availability and Future Prospects
Godement shared that GPT-4o mini will be available this fall through Apple Intelligence for mobile devices and Mac desktops, coinciding with the release of Apple's iOS 18 as part of a partnership with OpenAI announced during the WWDC event.
Though GPT-4o mini will run on OpenAI’s cloud servers rather than directly on devices, Godement emphasized that it remains faster than other models. Most third-party developers prefer this approach, as local deployment requires more complex setups and hardware. However, OpenAI may explore local deployment options for developers in the future.
Transitioning from GPT-3.5 Turbo
GPT-4o mini will begin replacing GPT-3.5 Turbo for ChatGPT subscribers, including those on Plus and Teams plans, with Enterprise support coming next week. While subscribers will enjoy access to the enhanced model without a price reduction, developers utilizing OpenAI’s API will experience notable cost savings.
Despite this transition, OpenAI will maintain support for GPT-3.5 Turbo in its APIs, allowing existing applications to continue operating without disruption. The company anticipates that developers will migrate to the new model due to its performance and cost advantages.
Early adopters, including companies like Ramp and Superhuman, have begun alpha testing GPT-4o mini, reporting impressive results. Ramp has leveraged the model for automatic receipt categorization, while Superhuman benefits from custom-tailored email responses.
Why Stick with GPT-4o?
Given GPT-4o mini’s performance and affordability, one might wonder why developers would opt for the full GPT-4o model. OpenAI maintains that for complex applications, such as those in medical fields or intricate software engineering tasks, the full model’s superior intelligence justifies its higher cost.
“For challenging applications requiring advanced intelligence, such as medical diagnosis or complex codebases, GPT-4o is the better choice,” Godement concluded. “If product differentiation relies on intelligence, GPT-4o delivers the best results.”