Elon Musk Unveils Grok-1.5: Approaching GPT-4 Performance Milestones

Mere weeks after open-sourcing Grok-1, Elon Musk’s xAI is set to launch an upgraded version of its proprietary large language model (LLM) — Grok-1.5 next week.

Grok-1.5 aims to enhance reasoning and problem-solving capabilities, bringing it closer in performance to established models like OpenAI’s GPT-4 and Anthropic’s Claude 3. While it can handle long contexts, it still lags behind Gemini 1.5 Pro, which supports a context window of up to 1 million tokens.

What’s New with Grok-1.5?

Initially announced last November, Grok-1 was inspired by "The Hitchhiker’s Guide to the Galaxy" and was designed to assist users in their quest for knowledge, irrespective of background or political stance. In previous benchmarks, Grok-1 surpassed Llama-2-70B and GPT-3.5.

With Grok-1.5, xAI claims notable improvements across key metrics. In tests, Grok-1.5 achieved a 50.6% score on the MATH benchmark, a 90% score on the GSM8K benchmark, and 74.1% on the HumanEval benchmark, demonstrating significant enhancements in both coding and math-related tasks.

Furthermore, Grok-1.5 achieved an 81.3% score on the MMLU benchmark, reflecting a marked improvement over Grok-1’s 73%. With a context window of up to 128,000 tokens, Grok-1.5 can process vastly more information — 16 times that of its predecessor — making it adept at analyzing and summarizing lengthy documents while maintaining effective instruction-following capabilities.

Competing with Leading Models

Grok-1.5 not only outdoes Grok-1 but also narrows the performance gap with leading models such as Gemini 1.5 Pro, GPT-4, and Claude 3. For example, Grok-1.5’s 81.3% in the MMLU benchmark edges out the recently released Mistral Large but still trails behind Gemini 1.5 Pro’s 83.7%, GPT-4's 86.4%, and Claude 3 Opus’s 86.8%. In the GSM8K benchmark, it similarly falls just short of the offerings from Google, OpenAI, and Anthropic. Notably, Grok-1.5 excels in HumanEval, outperforming all models except Claude 3 Opus.

Brian Roemmele, a tech consultant, anticipates that Grok-2, currently in training, will likely establish itself as one of the most powerful LLM AI platforms upon release, surpassing OpenAI in numerous metrics.

Availability of Grok-1.5

xAI plans to deploy Grok-1.5 next week, starting with early testers and users of the Grok chatbot on the X platform (formerly Twitter). The rollout will be phased, introducing new features, including a potential "fun mode," while gradually expanding access to more users.

Musk's initial release of Grok on X was part of a strategy to boost adoption of both Grok and the X platform. Grok is currently available through the platform’s ‘Premium+’ subscription for $16/month, but Musk recently announced that it will also be accessible to all $8/month Premium subscribers. Additionally, verified followers at certain subscription levels will receive Premium benefits, including free access to Grok.

Most people like

Find AI tools in YBX