Forget GPT-5! OpenAI Unveils New AI Model Family o1, Boasting PhD-Level Performance

Since the launch of OpenAI’s powerful GPT-4 large language model (LLM) in March 2023, users and developers have eagerly anticipated the release of its successor, GPT-5. However, OpenAI is taking a different route by introducing a new family of models: the o1 series.

Introduction of the o1 Model Family

OpenAI has unveiled its latest AI models, o1-preview and o1-mini, designed specifically to tackle complex tasks and solve challenging problems more effectively than the GPT series.

Available today for ChatGPT Plus users, the o1-preview is limited to 30 messages per week, while the o1-mini allows for 50 messages. It's important to note that as early models, their functionalities are still being developed; they currently lack features such as web browsing and file uploads, which are available in GPT-4.

Superior Capabilities of o1 Models

OpenAI asserts that the o1 series excels in handling intricate problems across various fields, including science, healthcare, and technology. These models are envisioned to assist physicists in formulating complex equations and help healthcare researchers annotate cell sequencing data effectively.

The o1-mini model offers particularly robust features for developers, making it suitable for executing multi-step workflows, debugging code, and addressing programming challenges.

o1-preview: PhD-Level Performance

The o1-preview model allocates more time for thoughtful responses, mimicking a human's problem-solving approach. Testing has shown that it can perform at a level comparable to PhD students in physics, chemistry, and biology. In coding, it ranks in the 89th percentile in Codeforces competitions, demonstrating high proficiency in debugging and generating solutions. In the International Mathematics Olympiad qualifying exam, it solved 83% of problems, significantly improving upon the 13% success rate of GPT-4.

This model is currently accessible to ChatGPT Plus and Team users, with Enterprise and Edu users gaining access next week. Additionally, developers eligible for API tier 5 can leverage the o1 models, although initial rate limits will apply.

o1-mini: Affordable and Efficient

Alongside o1-preview, OpenAI introduced the o1-mini model, a streamlined version that offers faster and more economical reasoning capabilities. While it specializes in coding and STEM domains, o1-mini also delivered impressive results, scoring 70% on the IMO math benchmark, closely trailing the o1-preview's 74% score, yet at a significantly lower cost. In coding evaluations, it achieved an Elo score of 1650 on Codeforces, placing it in the top 86% of programmers.

With an 80% reduced price compared to o1-preview, the o1-mini caters to developers and researchers seeking reasoning abilities without the extensive knowledge of the more advanced model. It will be available for ChatGPT Plus, Team, Enterprise, and Edu users, with plans to include ChatGPT Free users in the future.

Safety and Security Improvements

OpenAI's dedication to safety is evident in both models, which feature enhanced safety training mechanisms. The o1-preview model scored an impressive 84 on one of the most challenging jailbreaking assessments, a substantial improvement over GPT-4's score of 22. These models' ability to reason about safety protocols in context better equips them to handle unsafe prompts and mitigate the risk of inappropriate content generation.

OpenAI has also established partnerships with the U.S. and U.K. AI Safety Institutes, facilitating the evaluation and testing of future AI systems.

Future Developments for the o1 Series

While the o1-preview and o1-mini are powerful problem-solving tools, OpenAI recognizes this as just the beginning. The company plans to enhance these models regularly, adding features such as browsing, file uploading, and function calling in future updates.

As OpenAI continues to develop both the GPT and o1 series, users can expect ongoing advancements that enhance the capabilities and accessibility of AI across diverse applications.

Most people like

Find AI tools in YBX