There are multiple approaches to AI fine-tuning, training, and inference at the edge.
One alternative to the traditional GPU is the Neural Processing Unit (NPU) developed by Kneron.
AI Agents: Are You Ready?
At the Computex conference in Taiwan, Kneron unveiled its next-generation silicon and server technology aimed at enhancing edge AI inference and fine-tuning. Founded in 2015, Kneron boasts a notable list of investors, including Qualcomm and Sequoia Capital. In 2023, the company launched the KL730 NPU to tackle the global GPU shortage. Now, with the introduction of the KL830 and a preview of the upcoming KL1140 set for release in 2025, Kneron is also expanding its AI server offerings with the KNEO 330 Edge GPT server, which supports offline inference capabilities.
Kneron’s innovations are part of a niche yet expanding group of companies, including Groq and SambaNova, that seek alternatives to GPUs to enhance the power efficiency of AI workloads.
Edge AI and Private LLMs Powered by NPUs
A significant goal for Kneron's latest update is to facilitate private GPT servers capable of on-premises deployment. This eliminates the need for organizations to depend on extensive systems with cloud connectivity, as the Kneron KNEO system allows local inference right at the network's edge.
CEO Albert Liu shared that the KNEO 330 system integrates multiple KL830 edge AI chips into a compact server, promising affordable on-premises GPT deployments for enterprises. The earlier KNEO 300 system, powered by the KL730, is already utilized by major institutions like Stanford University.
The KL830 chip, positioned between the previous KL730 and the future KL1140, is specifically engineered for language models. It can be cascaded to support larger models while ensuring low power consumption.
New Tools for Edge AI Training and Fine-Tuning
In addition to hardware, Kneron places emphasis on software capabilities. The company has developed various tools for training and fine-tuning models designed for its hardware. Liu mentioned that Kneron combines multiple open models, fine-tuning them for optimal performance on NPUs.
Moreover, Kneron now offers a neural compiler that allows users to transfer models trained with frameworks such as TensorFlow, Caffe, or MXNet directly onto Kneron chips.
Their hardware also supports Retrieval-Augmented Generation (RAG) workflows. Liu highlighted that Kneron’s chips employ a unique architecture that reduces memory requirements for the large vector databases needed by RAG, enabling it to operate efficiently with lower power consumption.
Kneron’s Competitive Edge: Low Power Consumption
A standout feature of Kneron’s technology is its remarkably low power consumption.
“I think the main difference is our power consumption is so low,” Liu stated.
The new KL830 has a peak power consumption of only 2 watts, boasting consolidated calculation power (CCP) of up to 10 eTOPS at 8-bit processing. This low power consumption allows Kneron’s chips to be integrated into a variety of devices, including PCs, without the need for extra cooling solutions.