会议13_在 MPI 中设计网络内计算感知缩减集合.pdf

上传人： Ch****l

编号：171254

2024-07-03

PDF 33页 1.86MB

《会议13_在 MPI 中设计网络内计算感知缩减集合.pdf》由会员分享，可在线阅读，更多相关《会议13_在 MPI 中设计网络内计算感知缩减集合.pdf（33页珍藏版）》请在三个皮匠报告上搜索。

1、DESIGNING IN-NETWORK COMPUTING AWARE REDUCTION COLLECTIVES IN MPI2024 OFA Virtual WorkshopBharath Ramesh and Dhabaleswar K.(DK)PandaNetwork Based Computing LaboratoryThe Ohio State Universityhttp:/nowlab.cse.ohio-state.edu/3Network Based Computing LaboratoryOFA workshop April24IntroductionBackground

2、MotivationProblem Statement and ContributionsDesignOverviewRegistration cache designProposed Allreduce designResultsConclusion and Future workOutline4Network Based Computing LaboratoryOFA workshop April24Introduction:Drivers of Modern HPC Cluster ArchitecturesMulti-core/many-core technologiesRemote

3、Direct Memory Access(RDMA)-enabled networking(InfiniBand,RoCE,Slingshot)Solid State Drives(SSDs),Non-Volatile Random-Access Memory(NVRAM),NVMe-SSDAccelerators(NVIDIA GPGPUs)Acceleratorshigh compute density,high performance/watt9.7 TFlop DP on a chip High Performance Interconnects InfiniBandMulti-/Ma

4、ny-core ProcessorsSSD,NVMe-SSD,NVRAMFrontierSummitLumiFugaku5Network Based Computing LaboratoryOFA workshop April24Reduction collectives(such as MPI_Allreduce)are important for HPC and AIInvolve both compute and communicationUsing CPUs everywhere leads to sub-optimal scale-up and scale-out efficienc

5、yMotivates the need for offloading common operations away from the CPU to allow the CPU to perform other useful tasksIn-network compute allows offloading operations to network devicesSwitches are a good candidate due to high bandwidth and ability to reduce data on-the-fly eliminating redundancyHigh

6、scale-out efficiency and network topology awarenessFrees up CPU cycles for other operationsMPI Reduction collectives and In-network Computing6Network Based Computing LaboratoryOFA workshop April24IntroductionBackgroundMotivationProblem Statement and ContributionsDesignOverviewRegistration cache desi

会议13_在 MPI 中设计网络内计算感知缩减集合.pdf

相关报告