The reasons why AMD’s MI300X has a competitive advantage over Nvidia’s H100

According to Tech insights, the overall market for data-center AI chips in 2023 was $17.7billion, with Nvidia accounting for 65% market share. Intel was the second vendor with 22% market share whereas AMD (MI300X) had 11% market share. However, the situation regarding AI chips used for training AI models is different, with Nvidia’s chips accounting for 70~95% of the market, according to Mizuho Securities.

What is the main source of Nvidia’s dominance?

Nvidia’s CUDA (Compute Unified Device Architecture) introduced in 2026, allowed developers to run general purpose programming languages like C/C++ on GPUs. CUDA unlocked the GPUs’ computational power, particularly for matrix manipulations central to AI tasks like deep learning

CUDA allows Nvidia GPUs to achieve higher compute utilization than competitors. It’s a proprietary parallel computing platform that includes a compiler, runtime environment, and toolkit specifically designed to optimize operations for Nvidia GPUs. This results in better performance for tasks like training AI models, which are highly parallelizable.

CUDA’s proprietary nature has created a “lock-in” effect, where developers and organizations are reluctant to switch to other platform like AMD’s ROCm or open standards like Open CL. Competitors have struggled to gain market share, as their architectures underperform compared to Nvidia’s highly optimized CUDA platform for training AI models

Nvidia faced growing antitrust scrutiny in both US and France due to its dominance position in the AI chip market, particularly its GPUs and proprietary CUDA software. French regulators are set to charge Nvidia with anticompetitive practices, marking the first such action against the company. The core concern is that Nvidia’s closed CUDA ecosystem makes it difficult for competitors to challenge its dominance as transitioning from CUDA to other platforms.

PyTorch is a high-level deep learning framework that provides developers with the tools to design, train, and deploy AI models. It’s a framework that allows for ease of model development, experiment, and deployment across various platforms (e.s, CPUs, GPUs). PyTorch provides Python objects and functions that are much more user-friendly than those in C.

< PyTorch has supported both CUDA and ROCm(AMD) from 2021>

PyTorch supports MI300x

Source: Pytorch.org

As PyTorch’s support for ROCm expands, it lowers the barrier for uses to adopt AMD GPUs. This provides a viable alternative to Nvidia’s dominance in the AI market. AMD GPUs tend to be more cost-efficient that Nvidia’s high-end GPUs.

The surprising performance of AMD’s MI300X

AMD instinct MI 300X GPUs, powered by the open-source ROCm, showed impressive results in the MLPerf Inference V4.1 round, highlighting the strength of AMD’s inference platform, according to an announcement by AMD on August 28, 2024.

AMD MI300x performance

Source: AMD

A single MI300X with its 192GB HBM3 memory efficiently ran the entire LLaMA2-70B model, eliminating the need for model splitting across multiple GPUs, which can introduce network overhead. In a configuration with 8x MI300X GPUs and 2x AMD EPUC 9374F CPUs, the system’s performance was within 2-3% of the Nvidia DGX H100 setup with Intel Xeon CPUs, demonstrating near-parity in server and offline scenario at FP8 precision

< LLaMA 3.1 (405B) Estimated memory requirements>

AMD MI300X Memory requirements for LLaMa 3.1

The AMD Instinct MI 300X platform’s industry-leading memory allows a server with eight MI300X GPUs to run the entire LLaMa 3.1 Model (With 405 billion parameters) on a single server using FP16 datatype. This reduces the number of servers needed and helps lower cost.s

AMD’s MI300X is cheaper than Nvidia’s H100 or H200. The specific price has not been released, but it is estimated that the MI300x is priced at $20,000, while the H100 is 10% higher, and the H200 is priced at $30,000.

The strong value and comparable performance to Nvidia’s H100 are likely to draw more users to AMD’s MI300X for building inferencing systems.

Leave a Comment