As artificial intelligence (AI) shifts from the cloud to on-device systems, consumers must determine which laptops can efficiently run generative AI applications compared to other devices like desktops or all-in-ones. This knowledge is crucial, as it can significantly impact how quickly an image is generated—ranging from a few seconds to several minutes—and, as the saying goes, time equals money.
To simplify this process, MLCommons, an industry organization dedicated to AI hardware benchmarking standards, has launched performance benchmarks specifically for consumer PCs, referred to as "client systems."
Today, MLCommons introduced the MLPerf Client working group, aimed at creating AI benchmarks for desktops, laptops, and workstations operating on Windows, Linux, and other platforms. These benchmarks will be "scenario-driven," highlighting actual end-user experiences and grounded in community feedback.
The initial focus of MLPerf Client will be on text-generating models, specifically Meta’s Llama 2. David Kanter, executive director of MLCommons, pointed out that Llama 2 has already been integrated into the organization’s benchmark suites for data center hardware. Meta has also collaborated extensively with Qualcomm and Microsoft to enhance Llama 2’s performance on Windows devices.
“The timing is perfect for introducing MLPerf to client systems, as AI has become a standard aspect of computing,” Kanter stated in a press release. “We are eager to collaborate with our members to elevate MLPerf’s impact on client systems and foster new capabilities for the broader community.”
The MLPerf Client working group boasts members such as AMD, Arm, Asus, Dell, Intel, Lenovo, Microsoft, Nvidia, and Qualcomm; however, it does not include Apple. Apple’s absence is noteworthy, especially since a Microsoft engineering director (Yannis Minadakis) co-chairs the MLPerf Client group. Consequently, this may limit the applicability of MLPerf’s benchmarks for Apple devices in the foreseeable future.
Nonetheless, I am eager to see what benchmarks and tools will arise from MLPerf Client, whether or not they cater to macOS. Given that generative AI is here to stay—with no sign of decline—such metrics are likely to become increasingly influential in consumer device purchasing decisions.
In an ideal world, MLPerf Client benchmarks will function similarly to popular online PC build comparison tools, offering insights into expected AI performance for specific machines. There’s even the potential for future expansions to include smartphones and tablets, especially considering Qualcomm’s and Arm’s vested interests in the mobile ecosystem. While it’s still early in this development, I remain optimistic about what lies ahead.