1、CUBE高可扩展宇宙学N体问题模拟程序Shenggan ChengHang HuDereck InmanHao-ran Yu James Lin8ESHANGHAI JIAO TONG厦門大学UNIVERSITY#page#IntroductionOverview of CUBEPorting and optimization of CUBE on GPUAdvanced Optiimization#page#Nbody Empower Many Scientific ResearchNeutrino Masss Particle physics determineslower bounds
2、0.05 eVAstrophysics cosmology determinesupperbound Grid Density (Cloud-in-Cell)Process 22. Solve the Poisson Equationon the mesh (FFT)Tile 4Tile 3Physical Region3.Calculate force field from the mesh-defined potentialBuffer Region4.Interpolate force on gridto find forces on particlesGlobal FFT: apply
3、 on coarse-mesh need MPIFFTProcess 3Process 4Local FFT:apply on fine-meshGlobal FFTSHANGHAI JIAO TONGUNIVERSIT#page#Overview of CUBE- PM-PM-PPx一xSolve for ForcesF(t)= FclobalPM + FLocalPM +FppFpp=Gmi(e+1x-x)312Local FETTile 1softeningProcess2OOTile 4Tile 3Physical RegionBuffer Region2Process 3Proces
4、s 40Global FFTSHANGHAI JIAO TONGUNIVERSIT#page#Data Struct in PP1D Format Storage for ParticlesOrdered by Coarse Mesh (Consider Memory Cost)x-xFpp=Gmi(e+1x-x1)3/2PPRangeFine MeshCoarse MeshMemory Disorder forFine MeshNullHead ofthe LinkListIndex ofnext pratiacleSHANGHAI JIAO TONGUNIVERSIT#page#Overv
5、iew of CUBE- Mixed PrecisionSimulating the Nbody problemrequires a lot of memory.For exampleifyou usefloat to store particleinformation, 1012 particles need more than 105TB of memory CUBE uses int16/int8 to store particle information, breaking the memory capacity bottleneck(3,5)1.00.8(8,3)0.602(3+21
6、)mod16,5)(8.5)0.0(a/()(=Xa=2%x(a-a)-2nx-1,a=(nc-1)+2-8x(xa+2%x-1+1/2),)202/mSHANGHAI JIAO TONGUNIVERSITY#page#Overview of CUBE- Mixed Precisionfloat-point format:A100:A64FX:。float16,bfloat16FP64/32/16FP64/32/16fix-point format:TF32INT64/32/16/8INT8/4/2integer16,integer8至星在程验理 dynamic fixed-pointMixe