Hugging Face Unveils Advanced Code Generation Models for Enhanced AI Development

Home AI News Hugging Face Unveils Advanced Code Generation Models for Enhanced AI Development

Updated on October 23 2024

Hugging Face has launched the latest iteration of its code generation model, StarCoder2, developed with collaboration from Nvidia. This new version builds on the original StarCoder, which was introduced last May with ServiceNow. StarCoder2 excels in generating code across more than 600 programming languages and is designed for efficiency, offering three model sizes, with the largest containing 15 billion parameters. This compact design allows developers to utilize the model effectively on personal computers.

StarCoder2 has made substantial advancements, with the smallest variant matching the performance of the original StarCoder model containing 15 billion parameters. Notably, the StarCoder2-15B model stands out in its category, rivaling models twice its size.

### Collaboration with Nvidia

Nvidia has played a significant role in the StarCoder project, providing the infrastructure necessary to train the 15 billion parameter model. ServiceNow handled the training of the 3 billion parameter model, while Hugging Face took charge of the 7 billion version. Nvidia also employed its NeMo framework, which aids in the development of custom generative AI models and services, for creating the largest StarCoder2 model.

Jonathan Cohen, vice president of applied research at Nvidia, emphasized that their involvement introduces models that are secure and responsibly developed, promoting broader access to accountable generative AI to benefit the global community.

### Enhanced Dataset for Training

The training of the three- and seven-billion parameter models utilized an extensive corpus of three trillion tokens, while the 15 billion model was trained on over four trillion tokens. At the heart of StarCoder2’s capabilities is The Stack v2—a substantial dataset designed to advance code generation models.

The Stack v2 significantly exceeds its predecessor, The Stack v1, with a size of 67.5 terabytes compared to just 6.4 terabytes. This dataset is sourced from the Software Heritage archive, a public repository of software source code. It boasts enhanced language and license detection methodologies, alongside better filtering heuristics, which help train models with rich repository context.

### Accessing the Dataset

To explore The Stack v2 dataset, visit Hugging Face. However, users interested in bulk downloads must secure permission from Software Heritage and Inria. Given the variety of source codes included in The Stack v2, users should review the assortment of licenses to determine if the dataset can be utilized for commercial purposes. Hugging Face has compiled a comprehensive list of relevant licenses to ensure compliance.

By leveraging technological advancements and effective datasets, StarCoder2 promises to elevate the capabilities of code generation, offering developers a more robust tool for their projects.

OpenAI to Musk: Are You Upset We Achieved Success Without Your Involvement?

Elon Musk Discusses the Future of AI and Self-Driving Vehicles at Bosch Connected World Event

Most people like

Learn Anything

Unlock the knowledge on any topic effortlessly with Learn Anything, driven by TutorAI and cutting-edge artificial intelligence. Dive in and discover a world of learning at your fingertips!

Online learning Other

Gigapixel AI

Elevate your visuals with our professional image upscaling tool, now available for a free trial! Experience high-quality enhancements and discover the difference today.

AI tool AI Image Enhancer

PlagiarismCheck

Discover a reliable plagiarism checker designed specifically for educators and students alike. This powerful tool ensures the integrity of your work by identifying potential plagiarism quickly and accurately, making it an essential resource for academic success.

plagiarism checker AI Plagiarism Checker

Mava

Introducing an AI-first customer support platform designed specifically for community-driven businesses. Enhance your customer experience and foster engagement with our innovative solutions tailored to meet the unique needs of your community.

AI-first AI Customer Service Assistant

Find AI tools in YBX