In an increasingly tense geopolitical landscape, China faces significant challenges in accessing high-end AI chips. A dramatic development occurred on July 22, when conflicting messages from Nvidia highlighted the situation: on one hand, the U.S. government considered new trade restrictions that could block Nvidia from launching its "special version" HGX-H20 AI GPU in China, potentially costing the company around $12 billion. On the other hand, Nvidia is working on a brand-new GPU for the Chinese market, named B20, based on its recently released "Blackwell" architecture. The U.S. stance is clear: it seeks to comprehensively restrict China's access to high-end AI chips, aiming to dominate the AI sector.
In response, China has begun focusing on Tensor Processing Units (TPUs) as an alternative strategy. TPUs, specialized ASIC chips designed to efficiently process tensor operations, were first developed by Google in 2016. In the world of deep learning, tensors—multi-dimensional arrays—are ubiquitous, and TPUs are designed to handle these calculations with remarkable efficiency. They integrate numerous matrix computation units, enabling them to perform parallel processing of vast amounts of operations, significantly boosting computational efficiency.
While TPUs are more specialized compared to GPUs, they remain highly capable for AI training tasks. On a performance level, TPUs can offer a 15 to 30 times increase in performance and a 30 to 80 times boost in efficiency (performance per watt) when compared to CPUs and GPUs of the same era. In 2018, AGM Micro in China licensed TPU inference technology, but the company has since been silent on TPU developments. Recently, a Chinese company called Zhonghao Xinying unveiled its high-performance TPU (Tensor Processor) AI training chip. Their TPU, named "Shan-Ni," successfully entered mass production last year and has been deployed in intelligent computing centers across the country.
This TPU features interconnections among 1,024 chips, forming a large-scale intelligence computing cluster that surpasses traditional GPU performance by orders of magnitude, supporting training and inference for large AI models with over 100 billion parameters. The founder of Zhonghao Xinying, Yang Gongyifan, was previously a key chip developer at Google, deeply involved in designing TPUs 2, 3, and 4. He views TPUs as an ideal architecture for large AI models.
Additionally, a team at Peking University has made strides in the next generation of chip technology by developing the world's first carbon nanotube-based TPU. This innovation addresses two major bottlenecks in high-efficiency computing chip development: the traditional von Neumann architecture's inability to meet high-speed data processing requirements and the limitations of silicon-based transistors facing size reduction and power consumption challenges. Carbon nanotubes, with their excellent electrical properties, show promise in overcoming these hurdles and have demonstrated better performance and lower power potential than commercial silicon transistors.
The new chip employs 2-bit MAC (Multiply-Accumulate) technology, utilizes a 3-micron process node, and integrates 3,000 carbon-based transistors, allowing for accurate image contour recognition and extraction, achieving a 100% accuracy rate in profile extraction. Its architecture features a pulsed array design that enhances data reuse, thereby optimizing tensor computations to align perfectly with neural network requirements.
In comparison with other AI chips, TPU has garnered considerable attention this year. Reports indicate that OpenAI has been recruiting former Google TPU team members for in-house chip development. The impetus behind Google's introduction of TPUs was to provide a substitute for Nvidia's GPUs, and at the Google I/O 2024 event, they unveiled the sixth generation of TPUs, showcasing a 4.7 times improvement in peak computing performance over the previous version, TPU v5e.
As Google works to replace external AI processing chips with its TPUs—projected to account for 25% of global computing power—Nvidia finds itself increasingly isolated. As restrictions on special edition chips mount, the Chinese market is shifting its focus towards domestic chip suppliers. However, the AI chip market remains unpredictable, with recent news of a Japanese AI chip manufacturer announcing its dissolution. Ultimately, the viability of TPUs lies in their superior energy efficiency and robust software ecosystem—an area where China is making significant progress.