1、PresentersVijay Ram Inavolu,Satananda BurlaMarvell TechnologiesContributors:Veerasenareddy Burru,Pradeep Kumar Nalla,Radha Mohan ChintakuntlaAdvanced NIC SimulationFor AI Back-End Use CasesAdvanced NIC SimulationFor AI Back-End Use CasesPresenters:Vijay Ram Inavolu,Satananda BurlaMarvell Technologie
2、sContributors:Veerasenareddy Burru,Pradeep Kumar Nalla,Radha Mohan ChintakuntlaARTIFICIAL INTELLIGENCE(AI)The need for advanced NIC simulation in AI clustersHTSim:A scalable,fast network simulatorRoCEv2 simulation in HTSimArchitecture:New PCIe+NIC modelsInsights:Key insights to extract from PCIe and
3、 NIC modelsEarly insights:simulation resultsCall to action:collaboration opportunities and next stepsAgendaLLM workloads demand ultra-low latency,high bandwidth,and power efficiencyAI collectives(e.g.,AllReduce)create highly variable,cluster-specific traffic patternsNot like traditional networking l
4、oad in data centreMassive scale makes bottlenecks invisible until hardware deployment too late to fix cheaplySimulation reveals performance bottlenecks early,from chip level to cluster wideSimulation-driven design optimizes NIC performance at scale before siliconWhy NIC simulation is essential for A
5、I infrastructureHigh-performance discrete event simulationModels NW behaviour as events instead of actual packetsFaster and scalable than traditional full network simulatorsNetwork is abstracted as combination of pipes for delays and queues for processing capacityMemory efficient as packet represent
6、or objects are created only oncePrimary design goal is to evaluate congestion at scaleExtensible to add new protocols like UET,Falcon,etc.HTSIM:NW simulation without packetsRoCEv2 is the protocol of choice for AI backend communications.RoCEv2 is simulated in HTSim using packets with sequence numbers