Today, Inflection AI, the Palo Alto-based startup co-founded by DeepMind’s Mustafa Suleyman and LinkedIn’s Reid Hoffman, unveiled its latest foundation model, Inflection-2.5.
Building upon its predecessor, Inflection-2.5 significantly enhances performance, closely rivaling OpenAI's GPT-4, particularly in STEM subjects. This new model powers the company's Pi assistant, which competes with ChatGPT and Gemini, and is accessible via mobile and web platforms.
Advancing AI Competition
This launch represents a strategic move in the fast-paced AI landscape, where companies constantly seek to challenge OpenAI's dominance. Recently, Anthropic introduced Claude 3 Opus, marking a significant advancement by surpassing GPT-4 in performance.
Inflection-2.5: Performance Overview
Since its launch, Inflection AI has aimed to create an “empathetic, useful, and safe” AI that offers a more personal conversational experience than other models, including those in the GPT series. The new empathetic fine-tuning technique gives Pi a distinctive personality and a high emotional quotient (EQ).
With Inflection-2.5, the startup aims to bolster the model’s IQ, particularly in areas like physics and mathematics. Users can now engage with Pi on a wide array of topics, from hobbies to coding, biology assignments, and business planning.
Benchmark Performance
In benchmark evaluations, Inflection-2.5 shows significant improvements over Inflection-1 and narrows the gap with GPT-4, though it still trails behind. For example, on the MMLU benchmark, which assesses a spectrum of tasks, Inflection-2.5 scored 85.5, just shy of GPT-4’s 87.3. In STEM exams, it scored 63 on the Hungarian Math exam compared to GPT-4’s 68 and achieved the 85th percentile in the Physics GRE versus GPT-4’s 97th percentile.
In the GSM8K benchmark, containing 8,500 high-quality grade school math problems, Inflection-2.5 scored 86.3, compared to GPT-4’s 92. In the zero-shot HumanEval test, which assesses coding abilities, it received a score of 73.8 vs GPT-4’s 79.3.
Efficient Training and Real-Time Capabilities
Although it does not surpass GPT-4's performance, Inflection AI emphasized that Inflection-2.5 achieves "94% of GPT-4's performance" with a more efficient training process, utilizing only 40% of the training compute used for GPT-4.
Like GPT-4, Inflection-2.5 incorporates real-time web search capabilities, providing users with updated information on current events, a significant advancement for the Pi assistant, designed to be accessible to everyone. However, it's important to note that the quality of web-retrieved results may vary since no benchmarks assess that aspect.
How to Access Inflection-2.5
Inflection AI has already integrated the new model into its Pi chatbot, enabling users to test its capabilities immediately. While the company hasn't detailed specific user benefits from the upgrade, it has highlighted a positive impact on user sentiment, engagement, retention, and overall organic growth of the chatbot.
Currently, the Pi chatbot, available on Android, iOS, web, and desktop, boasts one million daily and six million monthly active users, with over four billion messages exchanged and an average conversation duration of 33 minutes.