《XPI:加速编程接口.pdf》由会员分享,可在线阅读,更多相关《XPI:加速编程接口.pdf(25页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CAPankaj MehraElephance Memory,Inc.XPI:Acceleration Programming InterfaceScope of XPI WorkDefine and evaluate,through implementation and integration,new,dev-friendly programming interfaces forNear Data Processing(NDP)infrastructurenow emerging at the inters
2、ection of:Domain-Specific Architecture(major trend)andComputational Memory/Storage(major opportunities,not yet a trend).1.Data Infra2.Scalable AI/ML3.Media Processing4.Memory NodesStorage device/array w/exposed/pvt RAM w/pre-/post-CXL NDP in front,integrated behind switch,or accelerator-first1.DPUs3
3、.Comp.Memory2.Comp.StorageTargeted Developer Communities:Adopters:Scale,Standardize,AccelerateProviders:Mutual and External ConsistencyModern Storage WorkloadsEB-scale storage PB-scale memory Open format dataData Gravity is a key consideration for power and performanceThe need for End-to-End acceler
4、ationCompute-Memory HierarchyFabricServerRackPackageMEMORY HIERARCHYV$HBMRDMADDRCXL 2CXL 1CXL 3NVMeUPICOMPUTE HIERARCHYCXL Type 2accelerator with memoryideal for near-memory processingCommand-Heavy&Communication-Poor interfaces of existing standardsConsider offloading sorts or group-bys on sharded t
5、ables DB,or convolution offload for graph neural networks DL,or offloading subgraph building against embeddings DLRMThe unit of offload is tiny;curse of Amdahl!Hosts coordinate all data movement between host:device(North-South)using eager data movement semantics even though CXL enables lazy semantic
6、device:device(East-West)data movement is not contemplated even though computational memories will naturally move data peer-to-peer in CXL 3Existing interfaces fundamentally lacking for CXL-enabled accelerators and memoryLack of application-specific optimization over function/data placement,movementF