《利用 P4 DPU 加速 AI:AMD AI 网卡实现更高层次的灵活性、性能和扩展性.pdf》由会员分享,可在线阅读,更多相关《利用 P4 DPU 加速 AI:AMD AI 网卡实现更高层次的灵活性、性能和扩展性.pdf(15页珍藏版)》请在三个皮匠报告上搜索。
1、Accelerated AI with P4 DPUs Next-Level Flexibility,Performance&Scale on AMD AI NIC2|High Performance AI NetworkingSolving the Most Critical Challenges for Scaling AI and HPCHigh Performance AI NetworkingSolving the Most Critical Challenges for Scaling AI and HPCOpen ProgrammablePerformant3|Large-sca
2、le AI Requires Next-Gen NetworkingIntelligent End PointsScale-Out Message semantic Higher scale with multi-tier fabric Load balancing Resilience in transport layer to recover from occasional packet loss Scale-Across Address the power constraint Higher latency Non-uniform bandwidth Intelligent conges
3、tion control and load balancingSite-1Site-2Scale-Up High bandwidth(10 x of scale-out)Low latency Unified memory space Load/Store semantic Extremely sensitive to failure and packet loss4|Open Interoperable Solutions for AMD Instinct AI NetworkingPollara 400 AI NIC Product OptionsRoCEv2,UEC-Ready RDMA
4、&Custom TransportLeadership Performance;P4 ProgrammableAMD InstinctMI3XX GPUPollara400-1Q400P(PCIe Gen5)Pollara400-1Q400P-OCP(OCP 3.0)New Product LaunchPollara 400:Ethernet Designed for the Era of AI 5|Driving Industry Momentum Through Strategic PartnershipsCOMPUTENETWORK6|AMD Pensando Pollara 400 A
5、I NICSingle Platform for Front-End and Back-End NetworkingAMD Pensando Pollara 400 AI NICSingle Platform for Front-End and Back-End NetworkingUnlocking Unprecedented Performance&Scalability for AIPerformance LeadershipPerformance Leadership10%Over Competition10%Over CompetitionMassive Cluster ScaleM
6、assive Cluster Scale20X20XLowerNetwork CostLowerNetwork Cost50%50%Over InfiniBandWith Multi-Plane7|Up to 10%better RoCEv2 PerformanceUp to 10%better RoCEv2 PerformancePollaraUp to1.1xPollara 4008%Reduced Job Completion Time(TFLOPS/s/GPU)8%Reduced Job Completion Time(TFLOPS/s/GPU)Key Features&Benefit