If you were looking to elevate the visibility of your prominent tech company with a budget of $10 million, how would you choose to invest it? Would it be in a flashy Super Bowl ad or a Formula 1 sponsorship? Another compelling option might be to invest in training a generative AI model. Although this approach isn't traditional marketing, generative models attract significant attention and serve as effective channels for driving sales of essential products and services.
Consider Databricks' newly announced DBRX model, which is comparable to OpenAI’s GPT series and Google’s Gemini. Now accessible on GitHub and the AI development platform Hugging Face, DBRX includes both a base version (DBRX Base) and a fine-tuned version (DBRX Instruct) that can be adapted using public, custom, or proprietary datasets.
“DBRX was designed to deliver valuable information across a wide range of topics,” stated Naveen Rao, Vice President of Generative AI at Databricks, in an interview. “While DBRX has been optimized for English, it can also converse and translate in various languages, including French, Spanish, and German.”
Databricks characterizes DBRX as "open source," in a manner similar to Meta's Llama 2 and the models from AI startup Mistral. However, the true definition of "open source" for these models is widely debated. Databricks claims it invested about $10 million and two months into training DBRX, asserting that it “outperform[s] all existing open-source models on standard benchmarks,” according to their press release.
The catch, however, is that utilizing DBRX effectively requires being a Databricks customer. To run the model in its standard configuration, you need a server or PC equipped with at least four Nvidia H100 GPUs (totaling about 320GB of memory). A single H100 can cost thousands, making it unaffordable for many developers and small business owners.
Alternatively, you can run the model on a third-party cloud service, but the hardware requirements remain high. For instance, Google Cloud offers only one instance type with H100 chips. While other clouds may charge less, running large models like DBRX generally isn’t inexpensive.
There is also fine print to consider. Databricks warns that companies with over 700 million active users will face "certain restrictions" similar to those imposed by Meta for Llama 2, and all users must comply with terms that ensure responsible use of DBRX. As of the time of publication, specific details regarding these terms hadn’t been disclosed.
Databricks aims to mitigate these obstacles with its Mosaic AI Foundation Model, which not only allows running DBRX but also provides a training framework to fine-tune the model with custom data. Customers can privately host DBRX through Databricks' Model Serving option or partner with the company to deploy it on their chosen hardware.
Rao emphasized, “Our focus is to make the Databricks platform the top choice for custom model development, ultimately benefiting us with more users.” He added, "DBRX showcases our advanced pre-training and tuning infrastructure, enabling customers to develop their own models efficiently. It's an intuitive starting point for engaging with Databricks’ Mosaic AI generative tools. Out-of-the-box, DBRX is highly capable and can be fine-tuned for outstanding performance at better costs than large, closed models.”
Databricks claims that DBRX runs up to twice as fast as Llama 2, thanks in part to its mixture of experts (MoE) architecture, which divides data processing into subtasks assigned to specialized "expert" models. While many MoE models utilize eight experts, DBRX boasts 16, potentially enhancing quality.
However, quality can be subjective. Databricks asserts that DBRX excels over Llama 2 and Mistral’s models in specific language understanding, programming, and logic tests, but it often falls short compared to the leading generative AI model, OpenAI's GPT-4, in most categories, except for specialized tasks like database programming language generation.
Additionally, it's important to highlight differences between models, especially as some comparisons draw criticism on social media. DBRX and GPT-4, which has a significantly higher training cost, serve different audiences; DBRX is positioned as an "open source" solution targeted at enterprises.
At the same time, DBRX shares some challenges with flagship models like GPT-4. Its operational costs are prohibitive for many, its training datasets aren’t available for public scrutiny, and it doesn’t fit the strictest criteria for open source.
Rao acknowledged DBRX has limitations, notably that, like most generative AI systems, it can produce "hallucinated" answers to queries despite efforts in safety testing. The model has been trained to relate words and phrases with certain concepts, yet inaccuracies can arise if those associations are imperfect.
Moreover, DBRX is not multimodal, meaning it can only handle text and cannot process or generate images. The specific datasets used for training remain undisclosed, with Rao stating only that no customer data was part of the training process. He noted, “We utilized a diverse collection of openly available datasets that the community widely recognizes and employs.”
When asked about potential biases in the training data or whether any datasets were copyrighted, Rao did not respond directly but assured, “We’ve been cautious in our data selection and have conducted red teaming exercises to address the model’s weaknesses.” Generative AI models often mirror their training data, which raises concerns for users regarding copyright infringement and bias. Without clear policies addressing legal liabilities for potential IP violations, organizations could be inadvertently exposed.
While some companies provide indemnification for legal issues stemming from their generative AI models, Databricks currently does not, although Rao indicated they are reviewing possible scenarios for such policies.
Given these challenges and DBRX's shortcomings, its appeal is largely limited to existing and prospective Databricks customers. Rivals like OpenAI provide more compelling technologies at competitive pricing, alongside other models that align more closely with conventional definitions of "open source."
Rao promised continued refinement of DBRX and forthcoming releases as the company’s Mosaic Labs R&D team explores new generative AI advancements.
“DBRX is advancing the open source model landscape and setting a benchmark for future developments,” he said. “We plan to release updated versions as we enhance output quality in terms of reliability, safety, and bias. Our vision is for the open model to serve as a foundation for customers to create custom capabilities using our tools.”
Given DBRX's current positioning in the competitive landscape, it seems there’s still a long journey ahead.
This article has been corrected to clarify that the model required two months for training and to remove an earlier misreference to Llama 2 in a specific paragraph. We regret any errors.