1、NVIDIACUDA 11 NEW FEATURESJingrong Zhang,GTC CHINA#page#AGENDA NVIDIA A100 Highlights Programing with CUDA 11Warp Synchronous ReductionOo L2 Cache Residency ControlAsynchronous copyAsynchronous barrierO#page#NVIDIAA100 HIGHLIGHTS5 Miracles of NVIDIA A1003Gen TeFasterFlexible,Easiertouse54BXTORS,HBM2
2、20xAlPerwith TF3200003Gen NVLINKand NSWITCHNewSparHarnessSparsity inAIModelsOptimalutilizationwith rightsizedGPUEfficient Scaling to Enable Super GPU7xSimultaneous instancesperGPu2XMore Bandwidth2APerformancehttps:/ Al ACCELERATIONBERT-LARGE TRAININGBERT-LARGEINFERENCE(FP16)(FP32)3X7X6X1X3001X1X1X0.
3、6XV100A100V100A100T4V1001/7thA100A1007GPU InstanceAllresultsare measuredZTSubasuMZaseud(E/t)pue8zubsmToseud(E/z)upnpuoldsosnseuduluedsinseudZd)uueeV100isDGX1Serverwith 8xV100,A100isDGXA100Server with 8xA100,A100usesTF32TensorCoreforFP32trainingBERTLargelnferenceusesTRT7.1forT4/V100,withINT8/FP16atba
4、tchsize256.Pre-productionTRTforA100,usesbatchsize94andINwithsparsity#page#MULTI-INSTANCE GPU (MIG)GPU InstanceSimultaneous Workload Execution With Guaranteed Quality Of Servicem Different sized MIG instances based on target workloadsGPU自月自AmberInstance包奶印店奶GPU SliceGPU EngineGPUGPUGPUGPUGPUGPUGPUCom
5、pute/MomoryowndOPU McmPUMcmGPU MemGPU MemGPUMemGPUMemSliceGraphicsSM SliceCopy#page#MULTI-INSTANCE GPU (MIG)Partitionm Up To 7GPU Instances= The number of slices that a Glcan be created with is not arbitrary.m Users can create Gls by specifying one of these profilesFraction ofFraction ofNVJPGProfile
6、 NameNumber ofMemoryNumber ofSMsbandwidthNVDECSandInstancesMemoryavailableNVOFA1/71/81/87MIG1g.5gbNoneNone32/72/81/41MIG2g.10gbNone3/74/81/22NoneMIG3g.20gb224/74/81/2MIG4g.20gb1None71751FullFullMIG7g.40gb1https:/ GPU (MIGPartitionDebugg)PyToreWTe8memory7computeGPCGPCGPCGPCGPCGPC4momory3compte2memory