Amazon’s cloud service, AWS, aims to democratize access to high-performance computing (HPC) with its new managed product, AWS Parallel Computing Service.
This service allows AWS customers to access powerful computer servers for large, compute-intensive workloads without needing dedicated systems administrators.
According to Ian Colle, the director of advanced compute and simulation at AWS, this enhanced access could significantly speed up innovation in technology and scientific discovery, traditionally reliant on HPC clusters. "Many existing workloads could benefit from high-performance computing resources, but the perception that it's only for large enterprises often discourages exploration," Colle explained.
Colle believes this perception will shift as companies discover the ease of using HPC clusters with the new service, fostering greater experimentation. "We’re reducing the administrative burden and eliminating the need for substantial capital investment in HPC clusters. Now, all you need is an AWS account to run experiments and assess how workloads can scale," he added.
Offerings of the Service
AWS Parallel Computing allows users to set up and manage groups of Amazon's Elastic Compute Cloud instances. The service utilizes the open-source HPC workload manager Slurm to facilitate cluster maintenance, alleviating the need for dedicated system administrators.
Previously, AWS provided access to HPC clusters, but users had to manage their own administrative resources. Now, customers aiming to scale scientific and engineering workloads can leverage familiar tools on AWS, including the Management Console and software development kits. The integration of Slurm allows users to migrate existing workflows to the AWS HPC cluster seamlessly, without the need for rearchitecting. Enterprises can also easily connect any APIs.
Colle emphasized that AWS's offering simplifies cluster administration, allowing customers to completely offload Slurm management to the service.
Availability
The service is initially available in several AWS regions, including Ohio, Northern Virginia, and Oregon in the United States; Frankfurt, Stockholm, and Ireland in Europe; and Sydney, Singapore, and Tokyo in Asia-Pacific. Some AWS customers, including companies like Germany-based Marvel Fusion, have had early access to showcase the range of use cases for HPC clusters. Marvel Fusion uses the service for research on limitless zero-emissions energy, while Australian company Ronin leverages it for HPC simulations in the cloud.
Growing Demand for HPC Clusters
The demand for HPC clusters has surged as companies increasingly rely on compute power for training large language models and other AI foundations. HPC networks are now essential not only for significant calculations like drug discovery but also for various AI workloads.
Traditionally, only major government labs and large corporations had access to supercomputers, with hardware manufacturers such as AMD, Intel, Nvidia, and IBM competing to create faster systems for these clients. However, the rise in interest from diverse companies has accelerated the growth of "HPC-as-a-service" offerings from cloud providers such as AWS, Google, Microsoft Azure, and Penguin Computing on Demand.
Gartner Analyst Tony Harvey notes that while HPC-as-a-service isn't new, evolving use cases are prompting more companies to seek access to supercomputers. "We're likely to see increased competition in this space as more offerings emerge, especially since HPC use now spans beyond just AI," Harvey said.
He added that democratizing access to HPC resources reduces wait times for high-performance supercomputers like the Hewlett Packard Frontier unit in Tennessee, which can have months-long waiting lists. "This enables new users to access these resources, maximizing the value of time for researchers and practitioners engaged in experimentation and predictive modeling," Harvey concluded.