Advancements in AI Training: A Cost-Effective Approach from Rice University
Artificial Intelligence (AI) serves as the backbone for popular digital assistants like Alexa and Siri, which rely on deep machine learning. However, training these AI models can often be expensive and time-consuming. Researchers at Rice University have developed an innovative method to accelerate the training of deep neural networks, making it faster and more affordable through the use of CPUs.
Traditionally, companies have turned to Graphics Processing Units (GPUs) for deep learning tasks, which can cost upwards of $100,000 for top-tier platforms. In response to this challenge, the Rice team created the Sub-linear Deep Learning Engine (SLIDE), an algorithm that achieves similar results in implementing deep learning without the need for specialized hardware.
In experiments comparing SLIDE on a 44-core Xeon-class CPU to a high-end GPU using Google's TensorFlow, the CPU completed a complex training workload in just one hour, while the GPU required three and a half hours. (Note: the reference to a "44-core Xeon-class CPU" likely refers to a 22-core, 44-thread CPU.)
SLIDE distinguishes itself by adopting a fundamentally different approach to deep learning. While GPUs analyze vast amounts of data using millions or billions of neurons, SLIDE selectively trains only the neurons relevant to specific cases. According to Anshumali Shrivastava, assistant professor at Rice's Brown School of Engineering, SLIDE also offers data parallelism benefits. For instance, when training on two different data instances—like an image of a cat and a bus—SLIDE can independently update the relevant neurons for each case, thus optimizing CPU utilization.
However, this method presents its own challenges, particularly concerning memory requirements. "Compared to GPU, SLIDE requires significant memory," noted Shrivastava. "Without careful management of cache hierarchy, we risk cache thrashing, leading to inefficiencies." After publishing their initial findings, the team collaborated with Intel, which helped enhance SLIDE's speed by approximately 50%.
While SLIDE shows great promise, it is unlikely to replace GPU-based training in the near future. Integrating multiple GPUs into a single system is more straightforward than doing so with CPUs. Nonetheless, SLIDE expands the possibilities for AI training, making it more accessible and efficient.
Shrivastava emphasizes that there is still much potential for optimization: "We've just scratched the surface. We have yet to explore vectorization or built-in CPU accelerators like Intel Deep Learning Boost. There are numerous techniques we can apply to enhance performance further." He concludes, "Our algorithm may be the first to demonstrate superior performance compared to GPUs, but I hope it inspires even more innovative approaches in the field of deep learning."