"Questions Arise Over Performance of New Open Source AI Leader Reflection 70B, Accused of ‘Fraud’"

Home AI News "Questions Arise Over Performance of New Open Source AI Leader Reflection 70B, Accused of ‘Fraud’"

Updated on October 25 2024

In just one weekend, the new contender in open source AI models faced significant scrutiny, casting doubt on its reputation.

Reflection 70B, a variant of Meta’s Llama 3.1 large language model released by the New York startup HyperWrite (formerly OthersideAI), was hailed for achieving impressive benchmarks. However, subsequent evaluations by independent testers raised questions about the validity of these claims.

On September 6, 2024, HyperWrite co-founder Matt Shumer proclaimed Reflection 70B as "the world's top open-source model" in a post on the social network X. Shumer detailed the model's use of "Reflection Tuning," a technique that enables LLMs to verify the accuracy of their outputs before presenting them to users, enhancing performance in various domains.

However, by September 7, an organization called Artificial Analysis publicly challenged this assertion. Their analysis cited that Reflection 70B achieved the same MMLU score as Llama 3 70B but fell notably short compared to Meta’s Llama 3.1 70B. This created a stark contrast to HyperWrite’s initial results.

Shumer later admitted that the model's weights were compromised during the upload process to Hugging Face, which could explain the discrepancies in performance when compared to internal tests.

On September 8, after testing a private API, Artificial Analysis acknowledged they observed impressive yet unverified results that did not meet HyperWrite's original claims. They also posed critical questions about the release of an untested version of the model and the lack of published model weights for the private API version.

Community members across AI-focused Reddit threads also voiced skepticism regarding Reflection 70B’s performance and origins. Some claimed it appeared to be a variant of Llama 3 rather than the anticipated Llama 3.1, raising further doubts about its legitimacy. One user even accused Shumer of perpetrating "fraud in the AI research community."

Despite the backlash, some users defended Reflection 70B, citing strong performance in their use cases. However, the rapid transition from excitement to criticism highlights the volatile nature of the AI landscape.

For 48 hours, the AI research community awaited updates from Shumer on the model’s performance and corrected weights. On September 10, he finally addressed the controversy, saying:

"I got ahead of myself with this announcement, and I apologize. We made decisions based on the information we had. I know many are excited about this potential yet skeptical. A team is working diligently to ascertain what occurred. Once we've clarified the facts, we'll maintain transparency with the community."

Shumer referenced a post from Sahil Chaudhary, founder of Glaive AI, who validated the confusion around the model’s claims and remarked on the difficulty in reproducing benchmark scores.

Chaudhary stated:

“I want to address the valid criticisms. I’m investigating the situation and will provide a transparent summary soon. At no point did I run models from other providers, and I aim to explain the discrepancies, including unexpected behaviors like skipping certain terms. I have much to uncover regarding the benchmarks and I appreciate the community's patience as I rebuild trust.”

The situation remains unresolved, with continued skepticism surrounding both Reflection 70B and its claims within the open-source AI community.

LightEval: An Open-Source Tool from Hugging Face for Enhancing AI Accountability

AI Orchestration: Fostering Harmony or Cultivating Dependence?

Most people like

Minutes AI

Revolutionize your meeting management with our automated AI tool that effortlessly transcribes audio into clear, concise meeting minutes. Experience seamless documentation and never miss a detail again!

note-taking AI Meeting Assistant

Composio

Introducing an innovative integration platform designed specifically for AI agents and large language models (LLMs) that streamlines API connectivity. This cutting-edge solution enhances the efficiency of data exchange while empowering developers to seamlessly connect their AI-driven applications.

API integration AI Developer Tools

Jamboss

Discover the power of an AI music generator designed for crafting and sharing unique, personalized songs. Experience a seamless way to unleash your creativity and produce custom tracks effortlessly.

AI music generator AI Music Generator

AI Yes or No Tarot

Explore the captivating synergy between age-old tarot traditions and cutting-edge technology. As the world evolves, so too does the way we connect with ancient wisdom. Discover how modern innovations are revitalizing tarot readings, enhancing accessibility, and bringing new dimensions to this age-old practice. Embrace the fusion of the mystical and the digital, and unlock the potential for deeper insights and transformative experiences through tarot in the digital age.

tarot AI Chatbot

Find AI tools in YBX