《深入理解 Nsight System 与 Nsight Compute 性能分析优化工具.pdf》由会员分享,可在线阅读,更多相关《深入理解 Nsight System 与 Nsight Compute 性能分析优化工具.pdf(89页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIADEEPDIVEINTONSIGHTSYSTEMS8 NSIGHTCOMPUTEBing Liiu,202012#page#Overview of ProfilersNsight SystemsNsight ComputeAGENDACase StudiesSummary#page#page#NSIGHT PRODUCT FAMILYStart hereNsight SystemsComprowokoadelperomanceRecheckoverallRecheckoverallCouAODive intographicsCUDAkonolsframosNsight Comput
2、eNsight GraphicsDetaled CuDAkenel perforFinished ifOpuimize:performanceToyaccasssatisfactory#page#page#NSIGHT SYSTEMSOverviewSystem-wide application algorithm tuningFocus on the applications algorithm- a unique perspectiveLocate optimization opportunitiess See gaps of unused CPU and GPU timeBalance
3、your workload across multiple CPUs and GPUsCPU algorithms,utilization,and thread stateGPU streams, kernels, memory transfers, etcSupport for Linux 8 Windows,X86-64 8 Tegra. Host only for Mac#page#NSIGHT SYSTEMSKey FeaturesComputeCUDA API. Kernel launch and execution correlationLibraries: cuBLAS, CUD
4、NN,TensorRTOpenACCGraphicsVulkan,OpenGL,DX11,DX12,DXR,V-syncOS Thread state and CPU utilization, pthread, file l/O,etc.User annotations API (NVTX)#page#Thread/coremigrationProcesses0andThread statethreads来科印CUDA andOpenGL API traceCuDNN andCUBLAS trace110201110101Kernel and memorytransfer activities
5、I日Multi-GPU#page#CPU THREADSThread ActivitiesGet an overview of each threads activitiesWhich core the thread is running and the utilizationCPU state and transitionOS runtime libraries usage: pthread, file l/O,etc.APl usage: CUDA,CuDNN, CuBLAS,TensorRT,口181pythonOSruntime librariesCUDAAPIcuEventSynch
6、ronizcuEventCuEverCUEVCUDNN11日CuBLAS#page#page#OS RUNTIME LIBRARIESldentify time periods where threads are blocked and the reasonLocate potentially redundant synchronizationsTPL122405pythonsemwait.jpthread.pthreadsemwaitfgetcOS runtime librariespthread cond waitBeqins:9.23185sCUDA APIlhMhEnds:9.2336