《Ethernet Fabric for High Performance Computing and AI ML workloads.pdf》由会员分享,可在线阅读,更多相关《Ethernet Fabric for High Performance Computing and AI ML workloads.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、EMPOWERING OPENMIT Technology Review,Feb 2020 Scale UpScale OutDigital Life as We Know it Would Not ExistTHE NETWORKIS THECOMPUTER600MEthernet ports shipped annuallyEthernet Leads in Switch Bandwidth and PowerTrident201020122014Trident2TomahawkTomahawk2201620182020Tomahawk5Tomahawk3Tomahawk4202251.2
2、T640G1.28T3.2T6.4T12.8T25.6T1W/100Gbps10W/100GbpsRelentless,Unmatched Advancement 80 x Increased Bandwidth 90%Improved Energy Efficiency40GbE,10G NRZ40GbE,10G NRZ100GbE,25G NRZ100GbE,25G NRZ400GbE,50G PAM4400GbE,50G/100G PAM4800GbE,100G PAM4Shipping NowSignificant Acceleration of Compute for AIOptim
3、ized forSerial TasksGPUOptimized forParallel TasksCPUSELF-DRIVING CARSGENOMICSMultiple CoresThousands of CoresBARD35%57%18%38%0%10%20%30%40%50%60%70%M1M2M3M4“Network I/O is Key For Recommendation Workloads”Time Spent in NetworkingRanking requires high injection&bisection bandwidthOCP keynote by Alex
4、is Bjorlin at 2022 OCP Global SummitM#=ML model#“Time Spent in Networking”is Impacted ByTransient oversubscriptionFlow collisionsIncastJericho3-AI Fabric:Ethernet Networking for AI at Scale32,000 AI Accelerators at 800Gbps eachLowest“Time Spent in Networking”AI AcceleratorJericho3-AI FabricAI Accele
5、ratorAI AcceleratorJericho3-AI10%Performance improvement=network more than paysfor itself“”Jericho3-AI Fabric Innovations Highest Performance Under Load and at ScaleCongestion-Free OperationEnd-to-end scheduled fabricNo collisions,no jitterUltra-HighRadixMassive,flat networks32,000 portsat 800GEZero
6、 Impact Failover(ZIF)Sub-10ns auto-path convergenceNo impact to job completionPerfect Load BalancingEqual spraying over all links of the fabricConsistent high performance at all network loadsEthernet Beats InfiniBand:10+%Improvement in Job