Elon Musk Unveils Grok-1.5: Approaching GPT-4 Performance Milestones

Home AI News Elon Musk Unveils Grok-1.5: Approaching GPT-4 Performance Milestones

Updated on October 28 2024

Mere weeks after open-sourcing Grok-1, Elon Musk’s xAI is set to launch an upgraded version of its proprietary large language model (LLM) — Grok-1.5 next week.

Grok-1.5 aims to enhance reasoning and problem-solving capabilities, bringing it closer in performance to established models like OpenAI’s GPT-4 and Anthropic’s Claude 3. While it can handle long contexts, it still lags behind Gemini 1.5 Pro, which supports a context window of up to 1 million tokens.

What’s New with Grok-1.5?

Initially announced last November, Grok-1 was inspired by "The Hitchhiker’s Guide to the Galaxy" and was designed to assist users in their quest for knowledge, irrespective of background or political stance. In previous benchmarks, Grok-1 surpassed Llama-2-70B and GPT-3.5.

With Grok-1.5, xAI claims notable improvements across key metrics. In tests, Grok-1.5 achieved a 50.6% score on the MATH benchmark, a 90% score on the GSM8K benchmark, and 74.1% on the HumanEval benchmark, demonstrating significant enhancements in both coding and math-related tasks.

Furthermore, Grok-1.5 achieved an 81.3% score on the MMLU benchmark, reflecting a marked improvement over Grok-1’s 73%. With a context window of up to 128,000 tokens, Grok-1.5 can process vastly more information — 16 times that of its predecessor — making it adept at analyzing and summarizing lengthy documents while maintaining effective instruction-following capabilities.

Competing with Leading Models

Grok-1.5 not only outdoes Grok-1 but also narrows the performance gap with leading models such as Gemini 1.5 Pro, GPT-4, and Claude 3. For example, Grok-1.5’s 81.3% in the MMLU benchmark edges out the recently released Mistral Large but still trails behind Gemini 1.5 Pro’s 83.7%, GPT-4's 86.4%, and Claude 3 Opus’s 86.8%. In the GSM8K benchmark, it similarly falls just short of the offerings from Google, OpenAI, and Anthropic. Notably, Grok-1.5 excels in HumanEval, outperforming all models except Claude 3 Opus.

Brian Roemmele, a tech consultant, anticipates that Grok-2, currently in training, will likely establish itself as one of the most powerful LLM AI platforms upon release, surpassing OpenAI in numerous metrics.

Availability of Grok-1.5

xAI plans to deploy Grok-1.5 next week, starting with early testers and users of the Grok chatbot on the X platform (formerly Twitter). The rollout will be phased, introducing new features, including a potential "fun mode," while gradually expanding access to more users.

Musk's initial release of Grok on X was part of a strategy to boost adoption of both Grok and the X platform. Grok is currently available through the platform’s ‘Premium+’ subscription for $16/month, but Musk recently announced that it will also be accessible to all $8/month Premium subscribers. Additionally, verified followers at certain subscription levels will receive Premium benefits, including free access to Grok.

OpenAI Introduces Voice Cloning AI Model, Currently Available Only to Select Partners

Google DeepMind Launches 'Superhuman' AI System: Revolutionizing Fact-Checking, Reducing Costs, and Enhancing Accuracy

Most people like

GoEnhance AI

Elevate your visual content by transforming videos and enhancing images with the power of AI technology.

Artificial Intelligence AI Video Enhancer

Uhmegle

Discover the exciting world of AI-moderated video and text chats with strangers from around the globe. Engage in captivating conversations while experiencing the innovative power of artificial intelligence in real-time interactions.

Other Other

Beam AI

Discover AI-powered, user-friendly, cloud-based takeoff software designed specifically for the construction industry. This innovative tool streamlines takeoffs, saving you up to 90% of the time typically spent and allowing you to handle 30% more estimates efficiently. Experience the future of construction estimation and elevate your project handling capacity today!

construction AI Product Description Generator

Automagical Apps

Discover top productivity apps for Google Workspace and essential Chrome extensions, trusted by over 3 million users to enhance efficiency and streamline workflows.

Productivity AI App Builder

Find AI tools in YBX