《HPC 应用性能分析和调优.pdf》由会员分享,可在线阅读,更多相关《HPC 应用性能分析和调优.pdf(31页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIAHPC应用性能分析和调优PengzhiZhu,Dec 2020会#page#AGENDAHPC应用计算特征简介HPC加速关键技术介绍HPC-X+OpenMPI+UCX+HCOLL应用创析和性能调优会#page#HIGH PERFORMANCE COMPUTINGHPC应用计算特征简介具#page#page#HPC-AI怎么计算方法数据工作负载分配、计算、规约计算输入计算方法计算框架气象数据数理方程MPI应用模型数据如手机材料和结构数理方程MPI应用图像文件卷积神经网络模型Al FrameworkBack Propagation卷积神经网络模型Distributed AI Frame
2、workStochastiic Gradient Descent#page#HPC算法框架(GROMACS)的API是什么样的(py38)Scat READMEfmd.gro运行命令:(py38)S(py38)S(py38)(py38)head npt.g2Protein in water624831001GLNN6.3513.5333.926成1891GLN6.4113.5573.8440.99660.3B0H21001GLM6.3173.6263.959H36.4061801GLN3.4793.9961.5030CA1091GLN6.2343.4633.8680.13171555HA100
3、1GLN6.1923.5323.7961.11821.15870.0857CB6.2983.3391091GLN3.7970.55599.7156.2083.2743.7581001GLNHB1-0.1427-0.6654HB21091GLM6.3403.3683.7020.51310.2232-0.6924106.3803.2531881GLNCG-0.2330-0.11080.00581091GLNHG1116.4833.2943.8940.2447-0.6359-2.50841001GLNHG26.3283.2463.9840.09700.050-1.05453CD6.3831881GL
4、N3.1063.8450.0122-0.19750.3281女0E11091GLN6.4813.0623.7860.15480.5739-0.023615Lz0*81891GLNNE26.2793.883-0.1614-0.2948-0.27199196T91091GLN3.0733.913-0.3436-1.20570.6744171001GLHE226.2741.40252.9273.8732.0496-0.6262181001GLN6.1193.4343.9580.2029-1.0566lpy38/#page#HPC算法框架中的计算+通信处理流程CPUhresOnCPUsms atpea
5、lwith GPUs:100sofpsatpeaGPUvannttp:/www.gromacs.org/GPu_acceleratio#page#HPC并行代码中的MPI通信nt7GROMACS 2020 Source Code#page#HPC-AI计算集群中的聚合通信GPUGGPU3GPU3GPU3GPU2GPUAI-ReducGPUO LGPUOGPU1GPUOEGPU3LGPU2GPU3#page#page#RDMA-远端内存直接访问User ApplicationUser Applicationser2Buffer 1Buffer 1Buffer 1Buffer 1KernelOSO
6、SBuffer 1Buffer 1RDMA over InfiniBandHCAHCAHardwareBuffer 1Buffer 1DINPINTCP/IP机架1机架2#page#NVIDIAGPUDIRECTIRDMA加速技术SystemMemorySPUGPUMemoryGPUWithout GPU Direct RDMANetwork-Same Data Copied 3xCPDSystemMemoryChips为加速深度学习训练而设计GPUGPUMemory为CUDA加速卡提供最低的通信延退WithGPUDirect RDMA(RequiresRDMA/RoCE)消除不必要的系统内存