《AmpereOne:面向人工智能和云原生工作负载的可持续计算.pdf》由会员分享,可在线阅读,更多相关《AmpereOne:面向人工智能和云原生工作负载的可持续计算.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、Sustainable Computing for AI&Cloud Native WorkloadsMatthew Erler-ArchitectAug 27,2024A MODERN SEMICONDUCTOR COMPANY BUILDING THE FIRST CLOUD NATIVE PROCESSORSFOR THE SUSTAINABLE CLOUDTurbo FrequencyHyperthreadingScale Up AcceleratorsPower Optimized,Consistent PerformanceLinear Core ScalingHigh Perfo
2、rmance,General-Purpose CoresParadigm ShiftTraditional Techniques No Longer ScaleAmpere Cloud Native Processors Do2The Ampere Roadmap:Powerful Roadmap with Rapid InnovationContinued Commitment to Leadership Performance Per Rack for AI Compute in Air Cooled EnvironmentsUp to 80 Cores 7nm8 Ch DDR4128 L
3、anes PCIe Gen4Up to 192 Cores 5nm8 Ch DDR5Up to 128 Cores 7nm8 Ch DDR4128 Lanes PCIe Gen4Ampere Altra FamilyAmpereOne Family*Remains Arm ISA Compliant.Continued Ship Support at Least Through 2030Up to 192 Cores 5nm12 Ch DDR5AmpereOneAmpereOne“M”AmpereOne“MX”Up to 256 Cores 3nm12 Ch DDR5Shipping Q4 2
4、4In FabricationShipping NowAmpereOne AuroraUp to 512 Cores Integrated AI SiliconTraining and InferenceAir CooledNext Design Product3AmpereOneCore:Overview41.Front End2.Execution3.Load Store4.Memory management5.L2 CacheBranch PredictorFetch QueueInstruction MMU16KB L1 Instruction Cache2MB L2 CacheDec
5、ode&RenameSOC Interface192 scheduler entriesVector&FP XVector&FP YInteger A0Integer B0Integer B1Integer A1Memory 0Memory 1ALU Branch FlagStoreALU Complex shiftsMulti-cycleALU Complex shiftsALUBranch FlagLoadStoreLoadVector FPFP store dataVector FP128-entry FP/Vector Register File166-entry Integer Re
6、gister File208-entry Reorder Buffer64KB L1 Data CacheDataMMU123445Load StoreL2 CacheMMUMMUFront EndExecutionAmpereOne Core Pipeline:Fetch,Decode and Rename5 State-of-the-art branch prediction 8-table TAGE direction predictor L1 and L2 BTB 256 entry 0-cycle 8k entry 2-cycle Dedicated indirect predict