Google Bard has recently achieved a significant milestone by surpassing GPT-4 on the LMSYS Leaderboard, positioning itself as the second highest-scoring chatbot in the competitive landscape. This advancement signals a shift in the chatbot arena, as Bard gains ground on GPT-4 Turbo, which continues to hold the top position. Historically, both GPT-4 Turbo and GPT-4 maintained dominance over the leaderboard, but Bard's ascent is attributed to its recent upgrade to Google's cutting-edge Gemini Pro large multimodal model.
The Chatbot Arena Leaderboard, developed by LMSYS Org—an open research group collaborating with the University of California, Berkeley, University of California, San Diego, and Carnegie Mellon University—serves as a benchmark platform for large language models. It features a unique format where models engage in “anonymous, randomized battles,” and rankings are determined using the Elo rating system, widely recognized in chess and competitive gaming.
Bard's latest version powered by Gemini Pro has become the second model to score over 1200 points on the leaderboard. This surge is part of a broader evolution as Google transitions from its previous model, PaLM 2, to the more advanced Gemini, which was first unveiled last December. The initial Pro version of Gemini has already been integrated into Bard, with the highly anticipated Gemini Ultra version expected to be released soon.
In this competitive landscape, Bard also outperformed all versions of Anthropic's Claude model, with the Gemini Pro Dev API version securing a higher rank than Claude 2.1 and GPT-3.5 Turbo. LMSYS expressed enthusiasm for this progression, stating, “The race is heating up like never before! Super excited to see what's next for Bard with the forthcoming Gemini Ultra release.”
Bard's rise is a welcome development for Google, especially following its challenging initial rollout. The chatbot has undergone regular updates, enhancing its integration across various Google applications, including YouTube and Docs. Feedback from users, particularly Redditors, has played a crucial role in shaping Bard's evolution. Following a solicitation for input from a Google product manager, users expressed a desire for Bard to offer features similar to ChatGPT, including dedicated mobile applications, customized instructions, and image generation capabilities—many of which are already in development.
While OpenAI's GPT-4 has consistently dominated model rankings, it remains firmly positioned at the top of Stanford's HELM Leaderboard, with GPT-4 Turbo close behind. Meanwhile, PaLM 2, the previous foundation for Bard, struggled to secure a high position, as it was surpassed by the Palmyra X V3 model from AI startup Writer, marking it as the highest-scoring non-OpenAI model on the HELM leaderboard.
As the landscape evolves, the competition among leading AI chatbots intensifies, setting the stage for innovative developments that will shape the future of conversational AI.