Did a Silicon Valley team copy a Tsinghua-backed large model?

Home AI News Did a Silicon Valley team copy a Tsinghua-backed large model?

The Battle of Big Models: The Controversial Launch of Llama3-V

The competitive landscape of large model developers is evolving, with recent events sparking considerable debate. A Stanford University team recently introduced Llama3-V, a multimodal large model claiming to rival established models like GPT-4-V, Gemini Ultra, and Claude Opus, all for just $500 in training costs. The authors, Siddharth Sharma and Aksh Garg, undergraduate students in Stanford's computer science department, have previously published multiple machine learning papers and worked with major companies such as Tesla and SpaceX. Llama3-V quickly gained traction, even reaching trending charts on Hugging Face, a pivotal platform in the machine learning community.

However, excitement around Llama3-V was short-lived. Users soon highlighted striking similarities between Llama3-V and MiniCPM-Llama3-V 2.5, a model released in May by the Tsinghua University-affiliated company Weibi Intelligence. Observers noted that both models exhibit similar structures, code, and configuration files—differing mainly in variable names. Llama3-V’s code appears to be a reformatted version of MiniCPM-Llama3-V 2.5, showing similar behaviors across various noise versions. Notably, Llama3-V employs the tokenizer from MiniCPM-Llama3-V 2.5, with several special tokens also appearing in Llama3-V. Reports suggest that simply renaming variables in Llama3-V’s code permitted it to operate successfully with MiniCPM-V code, raising concerns about originality in its development.

On June 3, Weibi Intelligence’s CEO, Li Dahai, shared concerns on social media, asserting that Llama3-V demonstrated capabilities akin to the Tsinghua Bamboo manuscripts, producing identical errors in a non-public dataset. Li emphasized that their model’s recognition capabilities were achieved through meticulous months of scanning and annotating numerous manuscripts. He reported that high Gaussian perturbations revealed both models exhibited strikingly similar performances in correct and incorrect outputs.

When asked how to prevent such issues, Li noted the difficulty, attributing it to academic ethics. In light of the allegations, the Llama3-V team removed submitted criticisms of theft and subsequently withdrew the project from open-source platforms, issuing an apology. Sharma and Garg explained that they did not directly manage the coding; their work was overseen by Mustafa Aljadery, a USC graduate who had not yet released the training code.

The issue of "repackaging" large models has been pervasive in the industry. Some advocate for extensive use of open-source resources, while others claim that true innovation necessitates proprietary development. Modern large models trace their origins to the Transformer neural network architecture introduced by Google Brain in 2017. Building on this framework, companies pre-train large models on vast datasets to improve generalization capabilities and accelerate learning tasks.

Essentially, the "core" of a large model encompasses the complexities of neural network architecture and pre-training, while "shells" denote fine-tuning—adjusting pre-trained models for specific tasks. Fine-tuning is typically a supervised process, utilizing labeled data to direct the model's learning. AI analyst Zhang Yi noted that "repackaging" often involves modifying variable names during fine-tuning stages based on open-source models to develop adaptations for specific scenarios.

Suki, a former designer at Yuque and co-founder of AI assistant Monica, outlined four phases of "repackaging":

1. Directly referencing OpenAI APIs to replicate responses.

2. Constructing prompts, which serve as a foundation for model implementation.

3. Vectorizing specific datasets to build proprietary databases that can address questions beyond ChatGPT's capabilities.

4. Fine-tuning the model using quality Q&A datasets to enhance task-specific understanding, consuming fewer tokens than other methods.

In conclusion, this controversy highlights a contentious yet common trend in AI model development—ongoing adaptations to fulfill niche demands across diverse fields.

AI Apex Asia Official Launch: Driving Innovation and Widespread Adoption of Artificial Intelligence Technology in Asia

China Meteorological Administration Launches AI-Driven Weather Forecast Model Demonstration Program

Most people like

NoteGPT

3.3M

Effortlessly summarize videos, articles, and text using AI technology. Engage in conversations with an intelligent AI assistant for enhanced insights. Generate transcripts seamlessly, automate your note-taking process, and effectively manage your folders. Enjoy streamlined productivity with our advanced tools designed for your convenience.

AI Summarization AI YouTube Assistant

Goover AI

403.1K

Introducing Your Personalized AI Research Agent for Comprehensive Knowledge Exploration Unlock the potential of a tailored AI research assistant designed to provide you with in-depth insights and knowledge. This innovative tool is crafted to facilitate personalized explorations, ensuring you have the resources and information you need at your fingertips. Whether you're a student, professional, or lifelong learner, our AI research agent adapts to your unique interests and inquiries, making knowledge acquisition both efficient and engaging.

AI research agent Large Language Models (LLMs)

unitQ

22.6K

Introducing the world's first AI engine designed specifically for optimizing product quality. This groundbreaking technology harnesses the power of artificial intelligence to ensure that every aspect of your product meets the highest standards. Enhance your production processes, reduce defects, and elevate customer satisfaction with our innovative solution. Discover how our AI-driven approach can revolutionize your quality assurance practices.

product quality AI Customer Service Assistant

Docsie

16.7K

Introducing a user-friendly web tool designed for effortlessly creating and managing your product documentation and knowledge bases. Streamline your workflow and enhance your team's performance with this intuitive solution.

knowledge base AI Knowledge Base

Find AI tools in YBX