《通过调整 SONiC 中的 AI 以太网交换机功能来提高 AI 集群的性能.pdf》由会员分享,可在线阅读,更多相关《通过调整 SONiC 中的 AI 以太网交换机功能来提高 AI 集群的性能.pdf(15页珍藏版)》请在三个皮匠报告上搜索。
1、Nanda RavindranVP of Technical SalesEthernet Based AI Cluster FabricPerformance Improvement-Tuning in SONICEthernet Based AI Cluster FabricPerformance Improvement-Tuning in SONICNanda RavindranARTIFICIAL(AI)INTELLIGENCEEthernet based AI Scale Out Fabrics need to have high bandwidth,low latency and a
2、lossless network,which enables to achieve maximum GPU utilization and best JCTEnabling and tuning following features in SONIC,helps achieve a high bandwidth lossless network RoCEv2DCQCN(PFC,ECN)DLBKey challenges in tuning processAI Fabric(Scale Out)design/topology differences Differences in NIC card
3、sDifferences in performance of Transceivers&CablesAI Cluster Fabrics Tuning ChallengesHigh DensityBurstinessLow EntropyElephant FlowsConfiguration on Test BedData Size512GbitsRoCEv2 DSCP 48All to All 4x8 Traffic Pattern with PXNMeasurements AlgoBW BusBWJobCompletion Time PacketdropPFCCountECN CountE
4、dgecore AIS800-64O SwitchesBased on Broadcom TH5 SiliconSONIC Release 202311Spirent simulated AI traffic patternNon-Blocking CLOS Ethernet fabricTest bed:AI Fabric Topology&ConfigurationPFC Configurationsudo config qos reloadsudo config qos dscp-tc add DSCP_TC-dscp 24-31-tc 3sudo config qos tc-queue
5、 add TC_Q-tc 3-queue 3sudo config qos tc-pg add tc-pg-prof-tc 3-pg 3sudo config interface qos tc-queue bind$iface TC_Qsudo config interface qos tc-pg bind$iface tc-pg-profsudo config interface qos dscp-tc bind$iface DSCP_TCsudo config interface buffer bind priority-group all 3ingress_lossless_profil
6、esudo config interface buffer bind queue all 3egress_lossless_profilesudo config interface pfc priority$iface 3 onEnables PFC on the specified interface for priority3,allowing flow control for lossless trafficReloads the current QoS configuration into SONiC.This is required before applying any QoS m