This article is part of the VB Special Issue titled “Fit for Purpose: Tailoring AI Infrastructure.” Explore all the other stories here.
As enterprises increasingly seek to develop AI applications and agents, the importance of utilizing various language models and databases for optimal results becomes evident.
Are you ready for AI agents? Switching an application from Llama 3 to Mistral may require sophisticated technological finesse. The key lies in the orchestration layer, a crucial intermediary that connects foundation models to applications, managing API calls to execute tasks effectively.
This orchestration layer primarily comprises software solutions like LangChain and LlamaIndex, which facilitate database integration. However, a crucial question arises: Is this layer solely software-based, or does hardware play a significant role beyond simply powering the AI models?
The answer is clear: hardware is essential in supporting frameworks like LangChain and the databases that underpin AI applications. Enterprises need robust hardware stacks capable of managing high-volume data flows while also considering devices that can perform substantial data center tasks on-site.
“While the AI middle layer is primarily a software issue, hardware providers can significantly influence its performance and efficiency,” states Scott Gnau, head of data platforms at InterSystems.
AI infrastructure experts emphasize that although software is fundamental for AI orchestration, its effectiveness hinges on servers and GPUs' ability to handle extensive data movement. Consequently, for the orchestration layer to function optimally, the underlying hardware must be smart and efficient, focused on high-bandwidth, low-latency connections to manage heavy workloads.
“This orchestration layer requires fast chips,” explains Matt Candy, managing partner of generative AI at IBM Consulting. “I envision a future where silicon, chips, and servers can optimize based on the model's type and size as the orchestration layer dynamically switches between tasks.”
Current GPUs already available can effectively support these needs.
John Roese, Global CTO and Chief AI Officer at Dell, notes, “It’s a hardware and software issue. People often forget that AI manifests as software, which operates on hardware. AI software is the most demanding we've ever created, necessitating an understanding of performance metrics and computational requirements.”
While the AI middle layer demands rapid, powerful hardware, new specialized equipment is not necessary beyond existing GPUs and chips.
“Certainly, hardware is a crucial enabler, but I doubt there’s any specialized hardware that will drive major advancements aside from GPUs to enhance model performance,” Gnau points out. “Optimization will stem from software and architecture, minimizing data movement.”
The emergence of AI agents has heightened the need to strengthen this middle layer. As AI agents communicate with each other and initiate multiple API calls, an effective orchestration layer is vital for managing this traffic with swift servers.
“This layer ensures seamless API access to all types of AI models and technologies, enhancing the overall user experience,” says Candy. “I refer to it as an AI controller within the middleware stack.”
AI agents are a hot topic in the industry and are likely to shape the development of enterprise AI infrastructure in the coming years.
Roese adds another consideration for enterprises: on-device AI. Companies must plan for scenarios where AI agents need to operate locally, especially if connectivity is lost.
“The critical question is where operations occur,” Roese suggests. “This is where concepts like the AI PC come into play. When a collection of agents collaborates on your behalf, do they all need to be centralized?”
He discusses Dell's exploration of on-device “concierge” agents that keep operations running even during internet outages.
Generative AI has facilitated an explosion in tech stacks, as new service providers emerge, offering GPU space, databases, and AIOps services. However, this expansion may not be permanent, warns Uniphore CEO Umesh Sachdev.
“While the tech stack has exploded, I believe we will witness a normalization phase,” Sachdev predicts. “Ultimately, organizations will consolidate resources in-house, and GPU demand will stabilize. The proliferation of layers and vendors is typical of new technologies, and we will see similar trends with AI.”
For enterprises, the best practice is to consider the entire AI ecosystem—from hardware to software—to ensure effective AI workflows.