《GPU 加速 python 计算.pdf》由会员分享,可在线阅读,更多相关《GPU 加速 python 计算.pdf(37页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIAGPU ACCELERATION INPYTHONDominic Wang I Solution ArchitectGTC CHINA#page#AGENDAGetting StartedBackgroundTesting SetupNumba CodeStep through Numba modificationsCuPy CodeStep through CuPy mmodifications#page#AVERAGE USERSC/C+Python106CoActive Developers20PythonC/C+LanguageSource:http#page#WHY AR
2、E WE HERE2?“Am a Python developer but really need the performance of CUDA C+.”“1 have custom arithmetic,i.e, SciPy, that doesnt exist otherGPUaccelerated package,i.e.CuPy.”“i have custom Numba kernels and im nervous about porting code toCuPys RawKernel.”“Are there any improvements that can be made t
3、o my currentNumba/CuPy code?nvID#page#GETTING STARTEDDrop-in GPU LibraryCustom Numba CUDA KernelsCustom Raw CUDAKernelsReplacementsLeverage JIT compilation andTomatch native CUDA speeds,NumPy - CuPyNumbas CUDA support to quicklywrap raw CUDA kernels in CuPy;Pandas - CuDFbuild and test custom CUDApre
4、compile and cache kernel toScikit-Learn - CUMLkernels with a Pythonic APIavoid JIT overheadNetwork-X-CuGraphPros:Pros:PrOS:Quickly build custom featuresMatches CUDA C+ speedTrivial code changeBoilerplate codeNo excess SW layer“Free” PerformanceCons:Cons:Cons;JIT compilation overheadLimited debugging
5、 toolsPotentially sub-optimalExcess register pressureSupport multiple dtypesLimited controlGPUAccelerating SciPySignal withNumba and CuPy ISciPy202013nttp5#page#TESTINGFind and run the codehttps:/ fallconda env create-f gtc_fall.ymlbash test_script.shInput size-210;Outputsize-220Performed onaDGX1Tes
6、laV100-SXM2-32GBIntel Xeon CPUE5-2598 v42.2GHzSetting GPUsudo nvidia-smi-ac 877,1530-i0#Set clockssudo nvidia-smi -pl 250 -i0 # Set power levels#page#PYTHON CODESciPy (Lombscargle)for i in range(freqs.shapeO)XC=0.tau=atan2(2*cs,cc-ss)/(2*freqsi)XS=0.C_tau= cos(freqsitau)CC=0.S_tau = sin(freqsi*tau)c