《ACF-S:AI计算结构中高性能数据移动的新方法.pdf》由会员分享,可在线阅读,更多相关《ACF-S:AI计算结构中高性能数据移动的新方法.pdf(20页珍藏版)》请在三个皮匠报告上搜索。
1、 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.ACF-S:A Novel Approach to High-Performance Data Movement in AI Compute FabricsSeptember 10,2024Rochan Sankar,Enfabrica 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.3:missionredefine networking for accelerated computing to deliver peak performance,res
2、iliency and node scale:team120+engineerspreviously built high-performance NICs,switches,routers,TPUs,graphics,host networking stacks 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.:productaccelerated compute fabric superNIC(ACF-S)1stchip codename millennium 8 Tbps bandwidth3.2Tbps Ethernet 5+Tbps 12
3、8 lanes PCIe 5/6ARM CPUACF-SSW Stack8 Tbps ACF-S card:what we are about 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.4:a systems perspective:scale-up supercomputing /mainframe,ccNUMAFully coherent memory system operating on a“large”problem by sharding computationWorker nodes synchronize state and
4、move memory closer using O(100ns)latency IPC transactionsCommunication protocols deeply embedded in the processor to enable“transparent”communication 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.5All blue links are IPC communicationCPUCPUCPUCPUCPUCPUCPUCPU:hyperscale cloud /the rise of scale-out c
5、omputingClient-server design,built for extreme,resilient application scalingAll communication uses retargetable,resilient,software managed RPCs(request-response)Workers and data pipelines are imminently reconfigurableDistributed,heterogenous compute nodes with high aggregate network throughputhigh t
6、olerance to latency(10s-1000s of microseconds)No shared-fate 2024 ENFABRICA CORPORATION.ALL RIGHTS RESERVED.6All green links are sharded RPC communicationLoad Balancer/RequestorSharded/ReplicatedWorkersSharded/ReplicatedWorkersCPUCPUCPUCPUCPUCPU:data center AI,ML systems /super,meet hyperModern infr