1、Chiplets for Generative AIModular accelerator chiplets with high bandwidth memory stacksJawad Nasrullah,CEO,Palo Alto Electron IncChiplets for Generative AIARTIFICIAL INTELLIGENCE(AI)AI1.Heterogeneous chips for generative AI-an example2.AI memory demand beyond HBM3.Challenges and opportunities of AI
2、 HW4.ODSA call to actionAgendaFoundation Large Language Model(LLM)ChatGPT3 175 Billion parametersChatGPT4 1.7Trillion parameters 4TB(Inference)27TB(Training)ChatGPT5 20 x increase?PromptOutputLanguage Training Corpora-A simple view of the generative pipelineSPICE and Verilog Manuals,App NotesGPT4 Tr
3、aining Cost Estimates X 2000-3000 needed for a month to train GPTRental cost/training$80 MillionElectricity cost/training$4 MillionEquipment CapEx$450 Million(Greenhouse gas emissions equivalent to 1600 gas-powered cars for a year.)Data Center GPU Estimates X 75k units 10.2kW(“600k H100 equivalent”-
4、Meta/Zuckerberg)Power Demand=750 MWNeeds own power generationEquipment CapEx$10 BCompute=1 x 1018 math operations/sPower Dissipation=20 WMemory=2.5 x 1015=2.5 Peta BytesChips for AIA heterogeneous exampleAMD MI300A228 GPU,24 CPU128 GB HBM DRAM5.3TB/s Memory BW AMD MI300X304 GPU192 GB HBM DRAM5.3TB/s
5、 Memory BW750W750A 1V16A 48V Gen AI justifying super expensive chipsAMD MI300 Chip304 GPU(8 XCD)192 GB HBM(DRAM)5.3 TB/s Memory BW750 W TBP MI300 OCP OAM ModuleOAM Heatsink MI300 OCP UBB8x OAM1.5 TB HBM(DRAM)42 TB/s Memory BW10 kW UBB Rack 50kWMI300 systems leverages OCP open accelerator infrastruct
6、ure78mm170mm102mmMI300 OAM304 GPUs192 GB750WHBMStackSubstrateGPUGPUIODIODPassive Si InterposerCPUCPUCPUHBMStackSubstrateGPUGPUIODIODPassive Si InterposerCPUCPUCPUHBM and beyondStill“there is plenty of room at the bottom”745umGPU50um1024 data busSilicon InterposerHBMStack110100100