Multi-chip GPU with up to 47.9 TFLOPS, 128 GB and 560 W
AMD introduced the next generation of Instinct graphics cards designed for high performance computing (HPC): the Instinct MI200 series. It uses the 2nd generation of the CDNA-2 architecture, which transfers AMD’s multi-chip approach from the CPU to the GPU series. Two GPU chips in a single package promise record performance.
CDNA vs dRNA
CDNA is a pure computing GPU architecture and dispenses with the so-called fixed function equipment and therefore has no texture units, rasterizers or even monitor outputs at all. The Infinity Cache introduced with RDNA 2 in gaming GPUs is also missing. Instead, there are faster HBM2 graphics memories on the large memory interface directly on the packaging and a few more important differences. The biggest concern is the multi-chip approach.
CDNA 2 is a multi-chip architecture
With CDNA 2, AMD carries the multi-chip approach introduced with Zen for CPUs in the GPU segment and all rumors indicate that RDNA 3 will follow the same path with gaming graphics cards.
With the Instinct MI200, which will be available in the three SKUs MI250X, MI250 and MI210 at the start, this means in concrete terms: On the gigantic package, eight HBM2 stacks are not lined up around one, but around two GPU dies, based on of which remains Graphics Core Next (GCN). The third die traded in the Rumor Mill for RDNA 3, which is believed to contain the Infinity Cache, is not used in CDNA 2. Both GPUs die to communicate with each other via a ” Very high bandwidth fabric interconnect “.
Each CDNA-2 chip contains 110 CUs or 7,040 shaders and is made up of 29 billion transistors (Navi 21: 26.8 billion), for a total of 220 CU or 14,080 shaders and 58 billion transistors. It is not known if this corresponds to the full expansion. The fact that CDNA’s 1st generation Arcturus GPU already offered 120 CUs suggests a slightly cropped design.
Faster matrix cores and faster memory
AMD has significantly extended the ability of the GPU to manage matrices, as they are used in particular in AI applications: up to 880 so-called 2nd generation matrix cores, now also compatible with the FP64, are available.
With HBM2e, Aldebaran also uses faster storage. With a clock frequency of 1.6 instead of 1.2 GHz via a now wide 8.096-bit interface, a memory bandwidth of 3.2 TB / s is achieved. This is more than what the Infinity Cache of Navi 21 achieves (2.0TB / s). The connection of memory to the GPU is also new: the 2.5D Elevated Fan-Out Bridge (EFB) simplifies packaging and reduces costs because the bridge chip is no longer in the specially adapted base substrate but in a level additional. This approach also fits better.
AMD’s first chip from 6nm production
The new multi-chip GPU, codenamed “Aldebaran”, is AMD’s first chip to be produced at TSMC using the 6nm manufacturing process. Also for the RDNA 3 (5nm and 6nm) and the Rembrandt APU, which will combine Zen 3 cores with an RDNA 2 GPU, should appear on this basis in 2022. Even Zen 3+ in 6nm has already been mentioned, but he will not leave the rumor mill.
Instinct MI250X and MI250 in OAM design
Like the first two products based on CDNA 2, AMD today introduces Instinct MI250X and Instinct MI250. Both come in the form factor OCP accelerator module (OAM) , which the Open Compute project standardized. In turn, Nvidia uses a proprietary standard for the A100.
Up to 560 watts TDP
The only difference between the MI250X and the MI250 is the number of active compute units; the maximum clock frequency and maximum power of the card are the same at 1700 MHz and 560 watts, respectively. The 560 watts require the cooling in the server rack to be via water cooling, with passive cooling it is 500 watts each – at what rate, AMD is not saying.
Both SKUS use 128 GB HBM2e with a clock rate of 1.6 GHz. The fact that AMD speaks up to 128GB compared to the series could also offer other variations.
The promised performance is huge
Based on its own credentials of the Instinct MI250X with a TDP of 560 watts, AMD predicts that the performance will be more than double that of the MI100 in conventional vector loads with single precision (FP32). With “only” an 87% increase in energy consumption, efficiency increases. This is more clearly the case with Double Precision (FP64), which the new GPU can perform in a single pass for the first time: Here the performance is more than quadrupled. Compared to the MI100, the calculation of the matrices is also multiplied by two; FP64 is possible for the first time with MI200.
However, putting the MI100 in the shadows is only the 2nd priority for AMD. The whole presentation of the new series makes it clear: the heart of Nvidia’s A100, which AMD clearly wants to beat in HPC loads and, thanks to its raw multi-GPU performance, even in applications such as training in AI, although Nvidia Ampere with tensor cores which offer better optimized units.
The first lots are for Frontier
AMD Instinct MI200 is now available, but initially to one customer: HP. The manufacturer is currently installing the Frontier supercomputer in the United States, which is expected to become the first exascale supercomputer outside of China. The first scientists are expected to be able to use the Oak Ridge National Laboratory computer in 2022.
Instinct MI210: Soon also available in plug-in card format
Also announced, but not yet presented in detail, AMD has the Instinct MI210. Like the MI100, it comes in the classic PCIe plug-in card format (PCIe 4.0) and should appear ” soon Early 2022. AMD does not reveal anything about key data. It is quite possible that the “up to 128 GB HBM2e” refer to this model of the series and that there is (also) less memory here.
In the evening, AMD also officially presented the new “Milan X” type Epyc processors with 768 MB of L3 cache thanks to stacked 3D V cache chips.