Open Source vs. Closed Models: Understanding the Real Costs of Implementing AI Solutions

The open-source release of Meta’s advanced large language model, Llama 2, has garnered significant acclaim among developers and researchers, particularly for its accessibility. This model has inspired the development of several AI systems, including Vicuna, Alpaca, and Meta’s own Llama 2 Long. However, operating Llama 2 can be considerably more expensive than using proprietary alternatives. Reports indicate that numerous startups are experiencing operating costs between 50% to 100% higher when utilizing Llama 2 compared to OpenAI’s GPT-3.5 Turbo, although the cutting-edge GPT-4 remains even pricier. Both of these formidable language models are foundational to ChatGPT.

Sometimes, the cost differential can be staggering. The founders of the chatbot startup Cypher conducted tests using Llama 2 in August, incurring a hefty $1,200 in costs, while the same tests on GPT-3.5 Turbo only set them back $5.

Recently, OpenAI introduced a new, more economical model, GPT-4 Turbo, which operates at one cent per 100 input tokens and is three times less expensive than the previous 8K version of GPT-4. At their DevDay event, OpenAI incentivized developers to explore the new model by offering $500 in free API credits to each attendee. While Llama 2 provides open access for users, the significant difference in operational expenses may dissuade companies from adopting it.

### Understanding the Cost Disparity

One key factor contributing to the higher costs associated with open-source models lies in the infrastructure used by companies. OpenAI can efficiently process millions of requests by batching them for simultaneous processing on high-performance chips. In contrast, startups like Cypher, which rely on open-source models and rent specialized servers through cloud providers, may not generate sufficient traffic to achieve similar efficiencies. This disparity limits their ability to harness the full potential of server capabilities.

The operational costs associated with open-source large language models can fluctuate dramatically, contingent on the specific tasks being performed, the volume of requests, and the level of customization required. For straightforward tasks such as summarization, costs can remain relatively low, while more complex functions may necessitate greater investment.

Bradley Shimmin, chief analyst for AI and data analytics, points out that there’s little transparency around the cost management strategies employed by OpenAI. “OpenAI likely benefits from economies of scale that are inaccessible to smaller enterprises attempting to host extensive models on cloud platforms like AWS or Azure,” he suggests.

### A Misalignment of Resources

In a recent analysis, Permutable.ai revealed its operational costs for utilizing OpenAI's technology, estimating about $1 million per year—20 times the cost of using in-house models. Wilson Chan, CEO of Permutable.ai, likens the use of ChatGPT for minor tasks to using a “sledgehammer to crack a nut”—effective yet excessively forceful. He cautions against the computational and financial resources tied to heavyweight models for routine tasks, stressing the importance of matching the AI model’s capability with practical needs to ensure cost-efficiency.

### Exploring Cost Structures

Operational expenses for large language models vary significantly, primarily based on their size. Llama 2 is available in several configurations, with the largest version boasting 70 billion parameters. Larger models require substantial computing power for training and execution, but they often deliver enhanced performance.

Victor Botev, CTO and co-founder at Iris.ai, notes that parameters can be optimized through techniques like quantization to reduce operational costs. While this can lower expenses, it carries the risk of diminishing response quality, so the decision must be carefully weighed according to user needs.

For on-premises deployments, models with fewer than 100 billion parameters necessitate at least one DGX box, which costs around $200,000. The annual hardware expense for running Llama 2 on-premises can reach approximately $65,000. In cloud settings, the operational costs vary by model size. For those below 15 billion parameters, the monthly expense is about $1,000, or $12,000 annually, while for models with around 70 billion parameters, costs rise to roughly $1,500 per month, totaling $18,000 annually.

Most models out of the box seldom meet companies' quality standards, prompting the need for various tuning techniques. Prompt tuning is the least costly method, priced from $10 to $1,000, while instruction tuning costs range from $100 to $10,000. Fine-tuning, which alters fundamental model attributes, can be unpredictable, averaging around $100,000 for smaller models (1-5 billion parameters) and reaching millions for larger configurations.

### A Shift Towards Smaller Models

In light of these considerations, the emergence of smaller, more cost-effective models for specific applications offers a promising alternative. Variants of Llama 2 with seven billion and 13 billion parameters are already available, and innovative models like Microsoft’s Phi 1.5 and EleutherAI's Pythia-1b are gaining traction.

Yet, as Omdia's chief analyst, Lian Jye Su, highlights, open-source offerings are seldom inexpensive, particularly when customization or enhancements are involved. Furthermore, while all OpenAI models are proprietary, some businesses perhaps prefer to avoid sharing revenues through licensing or royalties, thus relegating model cost to a less critical priority.

Anurag Gurtu, CPO of StrikeReady, emphasizes that startups must balance model costs with potential returns on investment. “AI models can foster innovation, enhance user experiences, and optimize operations. As we advance, the emergence of more efficient models and cost-effective solutions stands to make AI more accessible for startups and developers,” he predicts.

### Access to Computing Resources

Another significant factor influencing operational costs is access to hardware. In the current competitive landscape, companies are eager to deploy AI technologies, necessitating robust computing resources. However, the demand has outpaced supply. Nvidia, a market leader, recently reported considerable demand for its GPUs, with substantial deliveries in the second quarter. As competitors like AMD and Intel gear up with their own AI chips, the need for dependable access to compute power becomes vital.

With limited hardware availability, companies might face inflated costs to fulfill their computational requirements. Rentable GPUs from providers such as Hugging Face, NexGen Cloud, and AWS are available, yet the intensive requirements of models like Llama 2 necessitate powerful computing resources.

Tara Waters, chief digital officer and partner at Ashurst, notes that the consumption-based pricing of public models may deter some startups from allowing potential customers to explore and trial prior to purchase. While the availability of open-source models could alleviate some challenges, it brings new hurdles, such as the need for appropriate infrastructure to host and deploy these models effectively.

As the landscape evolves, innovative strategies are emerging to manage AI model consumption and costs. Exploring prompt engineering without hosting the model or developing intermediary solutions to streamline resource allocation for repetitive queries demonstrates the ingenuity required to navigate the current AI ecosystem.

Most people like

Find AI tools in YBX

Related Articles
Refresh Articles