1、NVIDIA HOPPER GPU:SCALING PERFORMANCEJACK CHOQUETTE|AUGUST 2022AGENDA H100 GPU Overview Accelerating Principles for Performance Data Locality&Cooperative Execution Asynchronous Execution&Data Transfer Accelerating Deep Learning Preview:Scaling Up and Out Wrap UpAGENDA H100 GPU Overview Accelerating
2、Principles for Performance Data Locality&Cooperative Execution Asynchronous Execution&Data Transfer Accelerating Deep Learning Preview:Scaling Up and Out Wrap Up80B Transistors,TSMC 4NHOPPER H100 TENSOR CORE GPU2ndGen Multi-Instance GPU Confidential Computing PCIe Gen5New Memory System Worlds First
3、HBM3 DRAM Larger 50MB L24thGen NVLink 900GB/s total BW New SHARP support NVLink Network132 SMs 2x Performance per Clock 4thGen Tensor CoreThread Block ClustersNEW HOPPER SM ARCHITECTURE 2x faster FP32&FP64 FMA 256 KB L1$/Shared Memory New 4thGen Tensor Core New DPX instruction set New Tensor Memory
4、Accelerator Fully asynchronous data movement New Thread Block Clusters Turn locality into efficiencyMemory data rates not finalized and subject to change in the final product.WORLDS FIRST HBM3 MEMORY ARCHITECTUREGreatest Generational Leap in Memory Bandwidth 3 TB/s 5 HBM sites with 80 GB capacity Dr
5、amatic improvement in HBM frequency New DRAM controller with 2x independent channels maintains same high efficiency01234P100V100A100H1002x DRAM BandwidthHBM22016HBM22017HBM22020HBM320221.7x1.6xDelivered Bandwidth TB/sHOPPER H100 MULTI-INSTANCED GPUSFaster and More SecureHigher perf per MIG 3X more c
6、ompute capacity 2X more memory bandwidthDedicated image and video decoders per MIGTrusted Execution Environment per MIG GPU virtualization(PCIe SR-IOV)HW-based security for confidentiality and integrity HW firewalls for mem isolation between MIGsMulti-Tenant,Single GPU Support Secure MIG:1 VM per GP