《会议13_在 MPI 中设计网络内计算感知缩减集合.pdf》由会员分享,可在线阅读,更多相关《会议13_在 MPI 中设计网络内计算感知缩减集合.pdf(33页珍藏版)》请在三个皮匠报告上搜索。
1、DESIGNING IN-NETWORK COMPUTING AWARE REDUCTION COLLECTIVES IN MPI2024 OFA Virtual WorkshopBharath Ramesh and Dhabaleswar K.(DK)PandaNetwork Based Computing LaboratoryThe Ohio State Universityhttp:/nowlab.cse.ohio-state.edu/3Network Based Computing LaboratoryOFA workshop April24IntroductionBackground
2、MotivationProblem Statement and ContributionsDesignOverviewRegistration cache designProposed Allreduce designResultsConclusion and Future workOutline4Network Based Computing LaboratoryOFA workshop April24Introduction:Drivers of Modern HPC Cluster ArchitecturesMulti-core/many-core technologiesRemote
3、Direct Memory Access(RDMA)-enabled networking(InfiniBand,RoCE,Slingshot)Solid State Drives(SSDs),Non-Volatile Random-Access Memory(NVRAM),NVMe-SSDAccelerators(NVIDIA GPGPUs)Acceleratorshigh compute density,high performance/watt9.7 TFlop DP on a chip High Performance Interconnects InfiniBandMulti-/Ma
4、ny-core ProcessorsSSD,NVMe-SSD,NVRAMFrontierSummitLumiFugaku5Network Based Computing LaboratoryOFA workshop April24Reduction collectives(such as MPI_Allreduce)are important for HPC and AIInvolve both compute and communicationUsing CPUs everywhere leads to sub-optimal scale-up and scale-out efficienc
5、yMotivates the need for offloading common operations away from the CPU to allow the CPU to perform other useful tasksIn-network compute allows offloading operations to network devicesSwitches are a good candidate due to high bandwidth and ability to reduce data on-the-fly eliminating redundancyHigh
6、scale-out efficiency and network topology awarenessFrees up CPU cycles for other operationsMPI Reduction collectives and In-network Computing6Network Based Computing LaboratoryOFA workshop April24IntroductionBackgroundMotivationProblem Statement and ContributionsDesignOverviewRegistration cache desi