1、AI Hardware&SystemsaiandsystemsInside Microsoft AI hardware innovationMark RussinovichCTO,Deputy CISO and Technical Fellow,Microsoft AzuremarkrussinovichMicrosoft AI supercomputer10,000 V100 GPUs#5 supercomputerMay 202014,400 H100 GPUs#3 in TOP500Nov 202330 x supercomputersMay 2024AcceleratorsCoolin
2、gNetworkingPowerConfidential AICoolingNetworkingAcceleratorsPowerConfidential AIDiverse accelerators on AzureA100,H100(available today)H200,GB200(coming soon)MI300 x(available today)Azure Maia 100(Internal)Maia 100 SpecsChip Size820mm2N5Package/Interposer TechnologyTSMC COWOS-SHBM BW/Cap1.8TB/s 64GB
3、 HBM2EPeak Dense Tensor POPS6bit:39bit:1.5BF16:0.8L1/L2 500MBBackend Network BW600GB/s(12x400gbe)Host BW(PCIe)32GB/s PCIe Gen5x8Design to TDP700WProvision TDP 500WMaia 100 SpecsInside Maia 100Single tileTile control processorTDMATile data movement engineL1 SRAMTensor unitVector engineInside Maia 100
4、SoCMesh-like NOC topology,with features optimized for MLGreat perf/W and performance4 Tiles per Cluster16 Clusters per SocTileSYNCCDMACSRAMTileTileTileFabric(Data,MSG,CFG)CCPCMPMAIA100 SoCHigh BW Data Mesh NOCHBM2E PHYPAM4 112G SerDes PHYPCIe PHY x16 Gen4Chip ManagementImage Decoder4096 IO12x4 PAM4P
5、CIe x8Security ManagementTileSYNCCDMACSRAMTileTileTileFabric(Data,MSG,CFG)CCPCMPTileCCPCDMAL2 SRAMTileTileTileClusterNOCMaia 100Server chassisCoolingNetworkingPowerConfidential AIAcceleratorsMaia liquid coolingMicrofluidics coolingCold plateMicrofluidics coolingMicrofluidic cooling systemMicrofluidi
6、cs coolingInterposerInletOutletLogic/FPGAMemory stackStaggered Micropin-finsPINHeight200 mMicrofluidics cooling Micropin-fins on CPUFluiddelivery tubes connectedAcceleratorsCoolingNetworkingPowerConfidential AIInfiniband networkingIntra-cluster networkingIB Leaf switchesIB Leaf switchesIB Leaf switc