First Impressions of OpenAI o1: An AI That Analyzes Everything Deeply

OpenAI unveiled its latest o1 models on Thursday, providing ChatGPT users their first opportunity to explore AI models that “think” before responding. Anticipation surrounding these models, codenamed “Strawberry,” has been significant. But does Strawberry live up to the expectations?

In short, somewhat.

While OpenAI o1 demonstrates strong reasoning capabilities and can tackle complex questions, it feels like a mixed bag when compared to GPT-4o. The cost of using o1 is approximately four times higher than GPT-4o. Furthermore, this new model does not include the tools, multimodal features, and speed that made GPT-4o stand out. OpenAI itself acknowledges that “GPT-4o is still the best choice for most prompts” on its support page, while also noting o1's struggles with simpler requests.

“It’s impressive, but the improvements are not groundbreaking,” commented Ravid Shwartz Ziv, an NYU professor specializing in AI models. “It excels at specific problems, yet it lacks a comprehensive upgrade.”

Given these considerations, it's crucial to reserve o1 for significant questions, as most users aren’t currently leveraging generative AI for such inquiries due to existing limitations in AI models. Nevertheless, o1 marks a tentative advance in the right direction.

Engaging with Big Ideas

OpenAI o1 distinguishes itself by “thinking” prior to answering, breaking down larger problems into manageable steps and discerning where errors may occur. This method of “multi-step reasoning” isn’t novel—researchers have suggested it for years, and platforms like You.com utilize it for complex queries—but it has only recently become feasible.

“There’s great enthusiasm within the AI community,” said Kian Katanforoosh, CEO of Workera and adjunct lecturer at Stanford. “By training a reinforcement learning algorithm alongside some of OpenAI’s language model techniques, we can create step-by-step reasoning that allows AI to approach large ideas methodically.”

However, OpenAI o1 also comes with a hefty price tag. Typically, users pay for input and output tokens, but o1 introduces an additional layer (the breakdown of complex problems), leading to significant unseen computational costs. OpenAI keeps some aspects of this process under wraps to maintain a competitive edge, but users still incur charges for "reasoning tokens," highlighting the need for careful usage to prevent excessive token expenses.

The concept of an AI model guiding you in “walking backwards from big ideas” is indeed powerful. In practice, o1 performs well in this regard.

For instance, I asked the ChatGPT o1 preview to assist my family in planning Thanksgiving dinner—a task that benefits from unbiased reasoning. Specifically, I wanted to determine if two ovens would suffice for preparing a meal for 11 people and whether we should consider renting an Airbnb for an additional oven.

After a thoughtful 12 seconds, ChatGPT generated over 750 words, ultimately advising that two ovens should work with careful planning, enabling my family to save costs and enjoy more time together. The model methodically explained its thought process, weighing various factors like expenses, family time, and oven logistics.

Interestingly, among its suggestions was to rent a portable oven, a smart yet unexpected idea. In comparison, GPT-4o required multiple follow-up questions about my menu before offering basic advice that felt far less helpful.

While asking about Thanksgiving may seem trivial, it underscores how this tool excels in deconstructing complex tasks.

I also tasked o1 with organizing a busy workday involving travel, several in-person meetings, and my office. While it provided a detailed itinerary, it may have been a tad overwhelming. Sometimes the additional steps can lead to information overload.

In contrast, for simpler queries, o1 tends to overcomplicate things. For instance, when I inquired about the locations of cedar trees in America, it generated an extensive 800+ word response detailing every cedar species and their scientific classifications. It even referenced OpenAI’s policies, somewhat unnecessarily. By contrast, GPT-4o offered a succinct response in just three sentences, explaining that cedar trees can be found across the country.

Managing Expectations

From the outset, Strawberry was unlikely to meet the lofty expectations set for it. Discussions regarding OpenAI’s reasoning models had surfaced as early as November 2023, coinciding with significant board changes at OpenAI that sparked speculation about the nature of its advancements. Some even speculated that Strawberry could represent a form of AGI, the aspirational goal of OpenAI.

To clarify, Altman publicly stated that o1 is not AGI—a point that becomes evident after using it. He further moderated expectations with a tweet asserting that “o1 is still flawed, limited, and it may feel more impressive at first than it does upon longer use.”

The AI community is now grappling with a launch that falls short of initial excitement.

“The hype escalated beyond OpenAI’s control,” noted Rohan Pandey, a research engineer at the AI startup ReWorkd, which builds web scrapers using OpenAI’s models. He hopes o1’s reasoning capabilities will effectively address specialized problems where GPT-4 may falter, but he acknowledges it’s not the groundbreaking advancement that GPT-4 was for the industry.

“Everyone is eager for a significant leap in capabilities, but this release doesn’t quite deliver on that front. It’s as simple as that,” said Mike Conover, CEO of Brightwave and co-creator of Databricks’ AI model Dolly.

Assessing Value

The foundational principles behind o1 have been in development for years. Techniques akin to those utilized in o1 contributed to Google's creation of AlphaGo in 2016, which became the first AI system to defeat a world champion in the board game Go. Andy Harrison, a former Googler and now CEO of the venture firm S32, emphasizes that AlphaGo trained itself repeatedly, effectively reaching superhuman performance through self-teaching.

This raises a longstanding debate within the AI realm.

One camp believes we can automate workflows through this methodical process, while another posits that generalized intelligence and reasoning would eliminate the need for a structured approach, enabling AI to make judgments like humans. Harrison aligns with the former perspective, asserting that the latter requires a level of trust in AI that we have yet to achieve.

Others view o1 less as a decision-maker and more as a tool for critically examining one’s thought processes concerning significant choices.

Katanforoosh shared an anecdote about an upcoming interview for a data scientist position. He stated his time was limited to 30 minutes and wanted to evaluate specific skills. By collaborating with o1, he could refine his approach while accommodating time constraints.

The pressing question remains: Is this valuable tool worth its steep price? As AI technology evolves and becomes more affordable, o1 stands out as one of the first models in quite some time to increase in cost.

Most people like

Find AI tools in YBX