Meta has unveiled the latest addition to its Llama series of open generative AI models: Llama 3. More specifically, the company has launched two models within the Llama 3 family — the Llama 3 8B, comprising 8 billion parameters, and the Llama 3 70B, boasting 70 billion parameters. Meta heralds these new models as a “major leap” over their predecessors, Llama 2 8B and Llama 2 70B, particularly in terms of performance. (In AI terminology, parameters gauge a model's competency in tasks like analyzing and generating text; generally, models with higher parameter counts demonstrate greater capability.) According to Meta, both Llama 3 8B and Llama 3 70B have been trained on extensive custom-built clusters of 24,000 GPUs, positioning them among the top-performing generative AI models currently available.
This is a bold assertion, so how does Meta back it up? The company references the performance of Llama 3 models on leading AI benchmarks such as MMLU (which assesses knowledge), ARC (which evaluates skill acquisition), and DROP (which tests reasoning over text). While the effectiveness of these benchmarks is often debated, they are still one of the few standardized methods that AI companies like Meta use to evaluate their models.
Notably, Llama 3 8B outperforms other open models, including Mistral’s Mistral 7B and Google’s Gemma 7B, both featuring 7 billion parameters, across at least nine benchmarks: MMLU, ARC, DROP, GPQA (encompassing biology, physics, and chemistry-related questions), HumanEval (a coding test), GSM-8K (math word problems), MATH (a mathematics benchmark), AGIEval (a problem-solving test set), and BIG-Bench Hard (a commonsense reasoning evaluation).