On Thursday, Ai2, a nonprofit AI research institute based in Seattle, unveiled its latest creation, the Tulu 3 405B model, claiming that it outperforms DeepSeek V3, one of the leading systems from Chinese tech company DeepSeek.
Tulu 3 405B doesn’t just top DeepSeek’s V3; according to Ai2’s internal tests, it also surpasses OpenAI’s GPT-4o on certain AI benchmarks. But here's the twist — while GPT-4o (and even DeepSeek V3) keep their architectures under wraps, Tulu 3 405B is open source, making it freely available for anyone to replicate from scratch, with permissive licensing.
A spokesperson for Ai2 told us that the institute believes Tulu 3 405B “highlights the U.S.’ potential to lead the way in global generative AI innovation.”
“This milestone is pivotal for open AI’s future,” the spokesperson continued, “showing that the U.S. can be a leader in competitive, open-source models independent of the tech giants.”
A Heavyweight in the AI Arena
With a 405 billion parameters, Tulu 3 405B is a heavyweight in AI. Ai2 claims that training the model required a massive 256 GPUs running in parallel. The more parameters a model has, the better it typically performs at problem-solving tasks, making Tulu 3 405B an impressive achievement in terms of performance.
One of the innovations behind its success is the use of reinforcement learning with verifiable rewards (RLVR). This technique trains the model using tasks that have verifiable outcomes, such as solving math problems and following specific instructions, leading to more accurate and reliable performance.
Beating the Competition
Ai2’s Tulu 3 405B didn’t just outclass DeepSeek’s V3 and GPT-4o on PopQA (a set of 14,000 specialized knowledge questions sourced from Wikipedia) — it also outperformed Meta’s Llama 3.1 405B. Furthermore, Tulu 3 405B scored the highest on GSM8K, a benchmark focused on grade school-level math word problems, solidifying its status as a top performer in its class.
For those eager to try it out, Tulu 3 405B is available for testing through Ai2’s chatbot web app. If you’re a developer, you can also find the training code on GitHub and Hugging Face, ready for experimentation.
So grab it while it’s hot — before the next benchmark-busting AI model enters the scene.