1、Srihari Vegesna,VP Architecture&TechnologySrinivas Gangam,Fellow ArchitectureUpscale AIPerformance Evaluation of Interconnect Technologies for AI Scale-Up Computing:UAL vs UALoE/SUE vs RoCEPerformance Evaluation of Interconnect Technologies for AI Scale-Up Computing:UAL vs UALoE/SUE vs RoCESrihari V
2、egesnaSrinivas GangamUpscale AI OCP SPECIAL FOCUS:ARTIFICIAL INTELLIGENCE(AI)Scale Up Domain Interconnect TransportMemory Semantics for AI Scale UpXPU UALoE/SUE FrameworkXPU Kernel ThreadTLBNOC PortPackingQueueing&SchedulingHeader EncapsulationLink Layer RetryEthernet Link&PHYNOC PortNOC PortUn Pack
3、ingParsingLink Layer RetryEthernet Link&PHYNOC PortOptimized Ethernet header for performanceEnd-to-end reliability need to be achieved outside if no link layer retry.FEC alone not sufficientQueue per(XPU,TC)Workload gets load balanced across multiple Ethernet NOC portsCompute Tile(CT)&MemoryNetwork
4、On Chip(NOC)Networking Tile(NT)MemoryXPU/Host MemoryScaleUp Ethernet SwitchXPU UAL FrameworkXPU Kernel ThreadTLBNOC PortTransaction LayerLink Layer RetryEthernet Link&PHYNOC PortNOC PortLink Layer RetryEthernet Link&PHYNOC PortWorkload gets load balanced across multiple Ethernet NOC portsCompute Til
5、e(CT)&MemoryNetwork On Chip(NOC)Networking Tile(NT)MemoryXPU/Host MemoryTransaction LayerPotentially need to add a light weight Shim to convert the memory semantic interface to UPLI interfaceNo packing,Queueing and Scheduling logicSignificant area,power and latency saving No un-packing logicArea&pow
6、er saving ScaleUp UALSwitchSimplified RoCE for XPU workloadRoCE Semantic based Data TransferXPU RoCE FrameworkXPU Kernel ThreadBlock Load balanceNOC PortDMA Command Rd&ReorderQueueing&SchedulingHeader EncapsulationLink Layer RetryEthernet Link&PHYNOC PortNOC PortDMA Wr ParsingLink Layer RetryEtherne