《MAIA 100内部.pdf》由会员分享,可在线阅读,更多相关《MAIA 100内部.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、Inside Maia 100Sherry Xu,Partner Lead SOC ArchitectChandru Ramakrishnan,Partner SW Eng ManagerMaia 100 IntroductionMicrosofts 1st-gen custom AI Accelerator Targets large-scale AI workloads Designed specifically for Azure to run production OpenAI models Vertical integration to optimize performance an
2、d reduce costSoftware-hardware codesign to unlock new capabilitiesCustom server boards with tailor-made racksImprove power efficiencyFirst generation designed for wide deployabilitySoftware stack build upLiquid cooling enablement Maia 100 SpecsChip Size820mm2N5Package/Interposer TechnologyTSMC COWOS
3、-SHBM BW/Cap1.8TB/s 64GB HBM2EPeak Dense Tensor POPS6bit:39bit:1.5BF16:0.8L1/L2 500MBBackend Network BW600GB/s(12x400gbe)Host BW(PCIe)32GB/s PCIe Gen5x8Design to TDP700WProvision TDP 500WMaia 100 SpecsInside Maia 100MAIA100 SoCTileSYNCCCPCMPCDMACSRAMTileTileTileFabric(Data,MSG,CFG)TileSYNCCCPCMPCDMA
4、CSRAMTileTileTileFabric(Data,MSG,CFG)TileCCPCDMAL2 SRAMTileTileTileCluster NOCHigh BW Data Mesh NOCHBM2E PHYPAM4 112G SerDes PHYPCIe PHY x16 Gen4Chip ManagementImage Decoder4096 IO12x400 PAM4PCIe x8Security Management4 Tiles Per Cluster16 Clusters Per SocTTU:Tensor UnitTVP:Vector engineTDMA:Tile Dat
5、a Movement EngineTCP:Tile Control ProcessorML-Specific ArchitectureHigh speed tensor unit Supports a wide range of data types(including MX data format)9bit compute/6bit computeConstructed as an 16xRx16 unitSizing is trade-off between granularity loss and peak performanceVector processor with custom
6、ISA tailored for MLLoosely coupled superscalar engineSupport FP32&BF16 DMA engine supports different tensor sharding Hardware semaphore to enable async programmingAchieve great perf/W and performance ML-Specific ArchitectureAchieve great perf/W and performance Data movement is a well-known bottlenec