Researchers Discover Google Gemini Falls Short Compared to GPT-3.5 Turbo

Home AI News Researchers Discover Google Gemini Falls Short Compared to GPT-3.5 Turbo

Updated on December 19 2023

Oh, Google. Will you ever release an AI product successfully on the first try?

Less than a month after launching Gemini, its highly anticipated ChatGPT competitor, Google faced substantial criticism for what were confirmed to be staged interactions in its promotional demo. Recent research indicates that the most advanced version available to consumers, Gemini Pro, lags behind OpenAI’s GPT-3.5 Turbo large language model (LLM) in most tasks.

The findings, presented by a team from Carnegie Mellon University and BerriAI in their paper “An In-depth Look at Gemini’s Language Abilities,” reveal that Gemini Pro performs slightly worse than GPT-3.5 Turbo across various tasks. The paper, published on arXiv.org, highlights that as of December 19, 2023, Gemini Pro's accuracy is notably less impressive than that of OpenAI's older model.

Google’s spokesperson responded, asserting that internal research shows Gemini Pro surpasses GPT-3.5 and that a more powerful version, Gemini Ultra, is coming in early 2024, reportedly outperforming GPT-4 in internal tests. They stated, “Gemini Pro outperforms inference-optimized models like GPT-3.5 and performs comparably with other leading models.”

The researchers tested four LLMs: Google Gemini Pro, OpenAI GPT-3.5 Turbo, GPT-4 Turbo, and Mistral’s Mixtral 8x7B. They used an AI aggregator site, LiteLLM, to assess the models over four days, utilizing various prompts, including 57 multiple-choice questions across STEM, humanities, and social sciences.

In their knowledge-based QA test, Gemini Pro scored 64.12/60.63, while GPT-3.5 Turbo achieved 67.75/70.07 and GPT-4 Turbo scored 80.48/78.95. Notably, Gemini consistently favored answer choice “D,” indicating a bias potentially due to insufficient instruction-tuning for multiple-choice formats. Furthermore, it struggled with specific categories such as human sexuality and formal logic due to safety response restrictions.

Gemini Pro did outperform GPT-3.5 Turbo in high school microeconomics and security questions; however, these gains were minimal. When testing longer or more complex queries, Gemini Pro showed decreased accuracy compared to both GPT models, although it excelled in word sorting and symbol manipulation tasks.

In programming capabilities, Gemini was again found lacking, performing worse than GPT-3.5 Turbo in completing Python code tasks. While Gemini Pro showed promise in language translation—outperforming GPT-3.5 Turbo and GPT-4 Turbo in several languages—it also exhibited a tendency to block responses across many language pairs due to content moderation.

The implications of these findings are significant for Google’s AI ambitions. As the release of Gemini Ultra approaches, Google may continue to trail OpenAI in generative AI performance. Interestingly, the research also indicated that Mistral's Mixtral 8x7B performed worse than GPT-3.5 Turbo across most tasks, suggesting that while Gemini Pro is not the best, it still outperforms some emerging competitors.

Overall, the study reinforces the notion that OpenAI currently maintains its lead in the generative AI landscape. As noted by experts like University of Pennsylvania professor Ethan Mollick, for most individual applications, GPT-4 remains the superior choice — at least until Gemini Ultra is released next year.

"Is a Chevy for $1 Possible? Exploring AI Chatbots and Their Risks in Automotive Customer Service"

Patronus AI Identifies 'Concerning' Safety Vulnerabilities in Major AI Systems

Most people like

Leo AI

Discover Leo AI, the best artificial intelligence for students! The perfect tool to help you with your homework and revision.

Student learning Homework Helper

Jimeng AI

Introducing an innovative AI tool that transforms text and images into stunning videos in an instant. This cutting-edge technology streamlines the video creation process, enabling users to effortlessly bring their ideas to life. Whether for marketing, storytelling, or education, this tool is designed to enhance your content with ease and efficiency. Embrace the future of video production today!

AI video generator AI Tiktok Assistant

AskYourPDF

1.2M

Introducing AskYourPDF, the innovative AI chat app designed to effortlessly extract valuable insights from your uploaded PDF documents. Discover how this powerful tool can enhance your productivity by transforming complex information into clear, actionable knowledge.

PDF AI AI Chatbot

SaasPedia

17.9K

Elevate your SaaS business naturally with proven marketing strategies designed for growth.

SaaS growth AI SEO Assistant

Find AI tools in YBX