AI2's New Cost-Effective Model: Open and Powerful Solutions for All

Home AI News AI2's New Cost-Effective Model: Open and Powerful Solutions for All

Updated on October 25 2024

The Allen Institute for AI (AI2), in collaboration with Contextual AI, has launched an innovative open-source large language model (LLM) called OLMoE. This model aims to balance strong performance with cost-effectiveness.

OLMoE features a sparse mixture of experts (MoE) architecture, consisting of 7 billion parameters while utilizing only 1 billion parameters for each input token. It comes in two versions: OLMoE-1B-7B for general use and OLMoE-1B-7B-Instruct for instruction tuning.

Unlike many other MoE models, OLMoE is fully open-source. AI2 highlights the challenges in accessing other MoE models, as they often lack transparency regarding training data, code, or construction methods. “Most MoE models are closed source, providing limited insights into their training data or methodologies, which hinders the development of cost-efficient open MoEs that can rival closed-source models,” AI2 stated in their paper. This lack of accessibility presents a significant barrier for researchers and academics.

Nathan Lambert, an AI2 research scientist, noted on X (formerly Twitter) that OLMoE could support policy development, serving as a foundational tool as academic H100 clusters become available. He emphasized AI2’s commitment to releasing competitive open-source models, stating, “We’ve improved our infrastructure and data without altering our core goals. This model is truly state-of-the-art, not just the best on a couple of evaluations.”

Building OLMoE

In developing OLMoE, AI2 adopted a fine-grained routing approach utilizing 64 small experts, activating only eight at any time. This configuration yielded performance comparable to other models but significantly reduced inference costs and memory requirements.

OLMoE builds upon AI2’s previous open-source model, OLMO 1.7-7B, which supported a context window of 4,096 tokens, using a training dataset called Dolma 1.7. For its training, OLMoE incorporated a diverse dataset including subsets from Common Crawl, Dolma CC, Refined Web, StarCoder, C4, Stack Exchange, OpenWebMath, Project Gutenberg, and Wikipedia.

AI2 claims that OLMoE “outperforms all existing models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B.” Benchmark results indicate that OLMoE-1B-7B often competes closely with models having 7 billion parameters or more, such as Mistral-7B, Llama 3.1-B, and Gemma 2. In tests against 1 billion parameter models, OLMoE-1B-7B significantly outperformed other open-source models, including Pythia, TinyLlama, and even AI2’s own OLMO.

The Case for Open-Source MoE

AI2's mission includes enhancing accessibility to fully open-source AI models, particularly within the increasingly popular MoE architecture. Many developers are turning to MoE systems, as seen in Mistral’s Mixtral 8x22B and Grok from X.ai, with speculation surrounding the potential use of MoE in GPT-4. However, AI2 and Contextual AI point out that many existing AI models lack comprehensive transparency regarding their training data and codebases.

AI2 underscores the necessity for openness in MoE models, which introduce unique design challenges, such as determining the ratio of total to active parameters, deciding between numerous small experts or fewer large ones, sharing experts, and choosing appropriate routing algorithms.

Furthermore, the Open Source Initiative is actively addressing what constitutes openness for AI models, highlighting the importance of transparency in advancing the field.

Are Your Enterprise's Generative AI Products Costing You Too Much? Discover DigitalEx's New Expense Tracking Solution

LightEval: An Open-Source Tool from Hugging Face for Enhancing AI Accountability

Most people like

Genius.AI

In today's digital landscape, leveraging Social Media AI is crucial for driving successful sales and marketing strategies. This innovative technology allows businesses to analyze consumer behavior, optimize campaign targeting, and streamline engagement efforts. By integrating AI solutions, brands can unlock valuable insights, create personalized experiences, and ultimately boost conversion rates. Discover how Social Media AI can transform your sales and marketing approaches to stay ahead of the competition.

AI platform Sales Assistant

aicut

Transform your text into compelling short-form video content in just minutes. Whether you're looking to engage your audience on social media or create eye-catching clips for marketing, we make the process quick and easy. Unlock the power of video by turning written material into dynamic visual stories effortlessly.

AI video creator AI Content Generator

Storyboard Hero

Quickly and affordably create engaging video concepts and detailed storyboards with AI-powered Storyboard Hero. Transform your creative process and bring your vision to life effortlessly!

AI Storyboard Generator AI Script Writing

Slides Wizard

Create Stunning Presentations in Seconds In today’s fast-paced world, the ability to generate captivating presentations quickly is essential for professionals and students alike. With cutting-edge tools at your fingertips, you can craft eye-catching slides in mere seconds, allowing you to focus on delivering your message effectively. Whether you’re preparing for a business meeting, academic lecture, or creative pitch, our streamlined process empowers you to produce high-quality presentations effortlessly. Say goodbye to hours of design work and hello to instant presentation perfection!

presentation AI Presentation Generator

Find AI tools in YBX