GPU 加速 python 计算.pdf_三个皮匠报告

GPU 加速 python 计算.pdf

上传人： li

编号：29485

2021-02-07

PDF 37页 41.09MB

《GPU 加速 python 计算.pdf》由会员分享，可在线阅读，更多相关《GPU 加速 python 计算.pdf（37页珍藏版）》请在三个皮匠报告上搜索。

1、NVIDIAGPU ACCELERATION INPYTHONDominic Wang I Solution ArchitectGTC CHINA#page#AGENDAGetting StartedBackgroundTesting SetupNumba CodeStep through Numba modificationsCuPy CodeStep through CuPy mmodifications#page#AVERAGE USERSC/C+Python106CoActive Developers20PythonC/C+LanguageSource:http#page#WHY AR

2、E WE HERE2?“Am a Python developer but really need the performance of CUDA C+.”“1 have custom arithmetic，i.e， SciPy， that doesnt exist otherGPUaccelerated package，i.e.CuPy.”“i have custom Numba kernels and im nervous about porting code toCuPys RawKernel.”“Are there any improvements that can be made t

3、o my currentNumba/CuPy code？nvID#page#GETTING STARTEDDrop-in GPU LibraryCustom Numba CUDA KernelsCustom Raw CUDAKernelsReplacementsLeverage JIT compilation andTomatch native CUDA speeds，NumPy - CuPyNumbas CUDA support to quicklywrap raw CUDA kernels in CuPy；Pandas - CuDFbuild and test custom CUDApre

4、compile and cache kernel toScikit-Learn - CUMLkernels with a Pythonic APIavoid JIT overheadNetwork-X-CuGraphPros：Pros:PrOS:Quickly build custom featuresMatches CUDA C+ speedTrivial code changeBoilerplate codeNo excess SW layer“Free” PerformanceCons:Cons:Cons；JIT compilation overheadLimited debugging

5、 toolsPotentially sub-optimalExcess register pressureSupport multiple dtypesLimited controlGPUAccelerating SciPySignal withNumba and CuPy ISciPy202013nttp5#page#TESTINGFind and run the codehttps:/ fallconda env create-f gtc_fall.ymlbash test_script.shInput size-210；Outputsize-220Performed onaDGX1Tes

6、laV100-SXM2-32GBIntel Xeon CPUE5-2598 v42.2GHzSetting GPUsudo nvidia-smi-ac 877,1530-i0#Set clockssudo nvidia-smi -pl 250 -i0 # Set power levels#page#PYTHON CODESciPy (Lombscargle）for i in range(freqs.shapeO）XC=0.tau=atan2（2*cs，cc-ss）/（2*freqsi）XS=0.C_tau= cos(freqsitau）CC=0.S_tau = sin（freqsi*tau）c

GPU 加速 python 计算.pdf

相关报告