AI Startup Outshines Google in Stanford's Latest Model Rankings

Home AI News AI Startup Outshines Google in Stanford's Latest Model Rankings

Updated on October 24 2024

In a surprising twist, a foundation model from the startup Writer has surpassed Google in the latest performance rankings conducted by researchers at Stanford University. The Palmyra X V3 model, boasting 72 billion parameters, emerged as the highest-scoring non-OpenAI model on the Stanford leaderboard for the Holistic Evaluation of Language Models (HELM) Lite. Despite its smaller size, Palmyra outperformed several larger contenders, securing the third position overall, while Google’s PaLM 2 claimed the fourth spot.

Additionally, another standout model, Yi-34B, developed by the Chinese startup 01.ai under the leadership of visionary Kai-Fu Lee, made notable waves. This open-source 34 billion-parameter model, trained on an impressive three trillion tokens, outperformed models such as Mistral 7B, Anthropic’s Claude 2, and Meta’s Llama 2, earning a coveted place on Stanford's leaderboard.

As expected, OpenAI’s GPT-4 maintained its position at the top of the Stanford rankings with a significant lead. Released last March, GPT-4 excelled on multiple benchmarks, including OpenbookQA for elementary science questions, MMLU for generalized standardized exams, and LegalBench, which tests models on legal task performance. OpenAI’s GPT-4 Turbo followed suit, securing second place. Unveiled at DevDay 2023, GPT-4 Turbo was designed for operational efficiency, capable of processing 16 times more text than its predecessor. However, it fell short of GPT-4's performance due to difficulties in adhering to provided instructions.

Percy Liang, an associate professor at Stanford, remarked on the unexpected results, highlighting how smaller models have recently outshined larger ones. “Some recent models are very chatty; they sometimes provide the correct answer in the wrong format, even when instructed otherwise,” he noted.

The HELM Lite framework was intentionally designed to evaluate models on a lightweight yet comprehensive scale. Building on their earlier HELM framework, Stanford’s latest test specifically assessed model capabilities. The research team plans to introduce a new benchmark focused on model safety, developed in collaboration with MLCommons.

HELM Lite evaluates various competencies, including machine translation, medical diagnostics, and literature comprehension. This project drew inspiration from the Open LLM leaderboard on Hugging Face, where Yi-34B currently ranks first. It's important to note that the Stanford research team did not utilize closed-system models like GPT-4 and Claude. Instead, they accessed standard interfaces and meticulously crafted prompts to elicit outputs consistent with the desired format.

Qualcomm CEO Predicts LLMs Will Revolutionize Personal Devices at CES 2024

CES 2024 Highlights: Exploring AI's Latest Innovations This Week

Most people like

Sudowrite

Discover Sudowrite, an innovative AI writing tool designed to accelerate the process of crafting novels and screenplays. This cutting-edge solution has garnered widespread acclaim for its ability to enhance creativity and streamline writing, making it a favorite among aspiring and seasoned authors alike.

AI writing tool AI Book Writing

Eden AI

Discover Eden AI, where we provide a seamless API that caters to both developers and non-coders, integrating diverse AI technologies for easy access and innovative solutions.

AI platform AI Product Description Generator

Fotor

Discover the ultimate AI photo editor designed to elevate your photography experience. This cutting-edge tool harnesses advanced artificial intelligence to effortlessly enhance your images, making editing faster and more intuitive than ever before. Whether you're a professional photographer or a novice, this powerful editor offers a seamless way to transform your photos into stunning works of art.

photo editing AI Photo & Image Generator

Lingvanex

Lingvanex provides a variety of advanced translation tools powered by neural machine translation, designed to boost productivity and streamline communication.

translator Translate

Find AI tools in YBX