This article is part of a VB Special Issue titled “Fit for Purpose: Tailoring AI Infrastructure.” Explore all the other stories here.
Data centers underpin the internet we use daily, enabling major companies like Netflix and Google to deliver digital services to users. As enterprises increasingly focus on advanced AI workloads, the traditional CPU-centric servers are being enhanced with specialized chips known as "co-processors."
The primary purpose of these co-processors is to augment server computing capacity, enabling them to handle the complex demands of AI training, inference, database acceleration, and network functions. In recent years, Graphics Processing Units (GPUs), particularly from Nvidia, have emerged as the preferred choice for co-processors due to their unparalleled speed in processing vast amounts of data. A study by Futurum Group reveals that GPUs accounted for 74% of the co-processors driving AI use cases in data centers last year.
This dominance is projected to rise, with GPU revenues expected to surge by 30% annually, reaching $102 billion by 2028. However, while GPUs excel in accelerating large-scale AI workloads—such as training massive language models or genome sequencing—they come with high total ownership costs. For instance, Nvidia’s flagship GB200 "superchip," which combines a Grace CPU with two B200 GPUs, is projected to cost between $60,000 and $70,000, and a server with 36 of these superchips may exceed $2 million.
This high cost is not feasible for every organization. Many IT managers are now seeking technologies that support low- to medium-intensive AI workloads while prioritizing total cost of ownership, scalability, and integration. Most AI models are maturing, shifting focus toward inferencing and optimizing performance for specific applications like image recognition and recommender systems, all while being cost-efficient.
This is where specialized AI processors and accelerators—developed by chipmakers, startups, and cloud providers—come into play.
What Are AI Processors and Accelerators?
AI processors and accelerators are specialized chips within server CPU ecosystems that focus on specific AI functions. They mainly fall into three architectural categories: Application-Specific Integrated Circuits (ASICs), Field-Programmable Gate Arrays (FPGAs), and Neural Processing Units (NPUs).
ASICs have been around for a while and are custom-designed for particular tasks (AI-related or otherwise), while FPGAs can be reprogrammed to implement custom logic. NPUs, on the other hand, are distinct in that they are designed specifically for accelerating AI/ML workloads like neural network inference and training.
“Accelerators can perform various functions individually, and with wafer-scale or multi-chip ASIC designs, they can handle numerous applications,” explains Daniel Newman, CEO of Futurum Group. NPUs exemplify specialized chips focused on tasks involving matrix mathematics and neural networks while using less power.
The efficiency of ASICs and NPUs, particularly for specific applications, often surpasses that of GPUs in terms of cost and power consumption.
“GPU designs primarily concentrate on Arithmetic Logic Units (ALUs) to perform multiple calculations simultaneously, whereas AI accelerators focus on Tensor Processor Cores (TPCs). The performance comparison between AI accelerators and GPUs is largely influenced by the fixed function of their designs,” says Rohit Badlaney, general manager for IBM’s cloud and industry platforms.
IBM adopts a hybrid cloud model, utilizing a mix of GPUs and AI accelerators—including offerings from Nvidia and Intel—to help enterprises meet their unique workload requirements efficiently.
“Our solutions aim to transform how enterprises, developers, and the open-source community leverage generative AI. AI accelerators are vital for clients deploying generative AI,” Badlaney explains. While GPU systems excel in large model training and fine-tuning, many AI tasks can be effectively managed by accelerators at a lower cost.
For example, IBM Cloud virtual servers utilize Intel’s Gaudi 3 accelerator, designed specifically for inferencing and heavy memory workloads. The company also plans to deploy this accelerator for fine-tuning and smaller training tasks.
“AI accelerators and GPUs can effectively handle similar workloads, including large language models and image generation. However, performance benefits hinge on the design of the hardware provider. For instance, the Gaudi 3 AI accelerator enhances compute power, memory bandwidth, and architecture-based power efficiency,” Badlaney elaborates, highlighting the associated price-performance advantages.
In addition to Intel, a slew of other AI accelerators are gaining traction in the market. These include custom chips developed by cloud giants like Google, AWS, and Microsoft, as well as specialized products—such as NPUs—from startups like Groq, Graphcore, SambaNova Systems, and Cerebras Systems, each offering unique capabilities that challenge GPU dominance.
Tractable, a company leveraging AI for property and vehicle damage analysis, reports significant performance improvements using Graphcore’s Intelligent Processing Unit-POD system compared to previous GPU usage.
“We observed a nearly 5X speed increase,” stated co-founder and CTO Razvan Ranca in a blog post. “This acceleration enables researchers to conduct substantially more experiments, enhancing the entire research and development process, yielding better product models.”
Furthermore, AI processors are driving training workloads. For example, Aleph Alpha’s supercomputer utilizes Cerebras’ CS-3 system, powered by a third-generation Wafer Scale Engine featuring 900,000 AI cores, to develop next-gen AI models. Google’s custom ASIC, TPU v5p, is also utilized for training tasks at companies like Salesforce and Lightricks.
Selecting the Right Accelerators
Given the variety of AI processors available beyond GPUs, IT managers must carefully consider their options. Certain chips might excel in performance and efficiency but may be limited in handling specific AI tasks due to architectural constraints, while others may offer broader capabilities with less pronounced total cost benefits compared to GPUs.
Experts recommend that chip selection should be guided by the scale and type of workloads, data demands, flexibility for future changes, and associated costs.
Daniel Kearney, CTO at Sustainable Metal Cloud, emphasizes the importance of conducting benchmarks to evaluate price-performance benefits and ensuring that teams are familiar with the surrounding software ecosystem of the chosen AI accelerators.
“When detailed workload information may not be readily available, we suggest thorough benchmarking, real-world testing, and peer-reviewed data to inform the decision-making process for selecting the optimal AI accelerator,” he advises. “This upfront investigation can lead to substantial time and cost savings, particularly for large training jobs.”
With growing demand for inference jobs, the global market for AI hardware—comprising AI chips, accelerators, and GPUs—is projected to expand by 30% annually, reaching an estimated $138 billion by 2028.