Jensen Huang, CEO of Nvidia, delivered a keynote at Computex in Taiwan, focusing on how Nvidia Inference Microservices (NIM) can transform AI model deployment from weeks to mere minutes.
Huang explained that the world’s 28 million developers can now download Nvidia NIM, which offers optimized AI models as containers for deployment across clouds, data centers, or workstations. This technology enables users to quickly create generative AI applications — such as copilots and chatbots — significantly enhancing productivity.
As AI applications become more complex and rely on multiple models for generating text, images, video, and speech, Nvidia NIM streamlines the integration of generative AI into existing applications. This efficiency extends to enterprises, allowing them to maximize infrastructure investments. For instance, running the Meta Llama 3-8B model on NIM can yield up to three times more generative AI tokens than without it, enhancing output without additional computational costs.
Nearly 200 technology partners, including Cadence, Cloudera, and DataStax, have integrated NIM into their platforms to accelerate the deployment of generative AI for specialized applications. Hugging Face now also offers NIM, starting with the Meta Llama 3 model.
“Every enterprise is looking to incorporate generative AI, but not all have dedicated AI research teams,” said Huang. “Nvidia NIM is making generative AI accessible to all organizations by being integrated across platforms everywhere.”
NIM facilitates the deployment of AI applications through the Nvidia AI Enterprise software platform. Starting next month, members of the Nvidia Developer Program can access NIM for free for research and testing on preferred infrastructures.
NIM includes over 40 microservices that cater to various industries, such as healthcare. The NIM containers are pre-built for GPU-accelerated inference and can incorporate Nvidia's CUDA, Triton Inference Server, and TensorRT-LLM software.
Developers can access Nvidia NIM microservices for Meta Llama 3 via Hugging Face's platform, enabling easy deployment of Llama 3 models with just a few clicks. Enterprises can leverage NIM for generating text, images, video, speech, and even creating digital humans. Additionally, Nvidia BioNeMo NIM microservices assist researchers in innovating new protein structures to expedite drug discovery.
Numerous healthcare organizations are utilizing NIM for various applications, including surgical planning and clinical trial optimization.
Leading technology providers like Canonical, Red Hat, and VMware are supporting NIM on open-source KServe, while AI companies such as Hippocratic AI and Glean are embedding NIM for generative AI inference. Major global consulting firms, including Accenture and Deloitte, are developing NIM competencies to help enterprises launch AI strategies swiftly.
NIM-enabled applications can be deployed on Nvidia-certified systems, including those from Cisco, Dell Technologies, and other major manufacturers, as well as cloud platforms like AWS and Google Cloud. Notable companies such as Foxconn and Lowe’s are already applying NIM in fields like manufacturing and healthcare.
Nvidia is expanding its certified systems program, ensuring platforms are optimized for AI and accelerated computing. New certifications include Spectrum-X Ready systems for data centers and IGX systems for edge computing, both validated for enterprise-grade performance.
Through NIM, enterprises worldwide are establishing “AI factories” to streamline data processing and enhance intelligence output. Nvidia NIM, combined with KServe, will simplify generative AI deployments, making it accessible via platforms from partners like Canonical and Nutanix.
Moreover, Huang highlighted that Meta Llama 3, a cutting-edge large language model trained with Nvidia's accelerated computing, is significantly improving workflows in healthcare and life sciences. Now available as an Nvidia NIM inference microservice at ai.nvidia.com, Llama 3 equips developers with the tools needed to innovate responsibly across various applications, including surgical planning and drug discovery.