《分解基础架构的纵向扩展和横向扩展挑战.pdf》由会员分享,可在线阅读,更多相关《分解基础架构的纵向扩展和横向扩展挑战.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CASiamak TavallaeiChief Systems ArchitectCXL Advisor to the Board,CXL ConsortiumScale-up and Scale-out Challenges for a Disaggregated AI/ML InfrastructureCXL as the standard protocol for data-movement through coherent memory and for the associated managemen
2、t and system composabilityPhotonics Interconnect for Extended ConnectivityTitle:Scale-up and Scale-out challenges for disaggregated AI/ML infrastructureAn Exa-FLOP AI/ML/HPC system requires 256 xPUs at 4 Tera-FLOP each.Such an xPU may consist of a tightly connected constellation of chiplets.To feed
3、the pipeline of a 4-Tera-FLOP compute engine,we need large SRAM backed by over 80GiB of high-bandwidth DRAM.To operate such an xPU,we may require 1000W of power and cooling.Each Node may hold eight of such xPUs(32 Nodes 8kW each).Each Rack may hold four such Nodes(32kW).These Nodes need to connect t
4、o the public world and be interconnected to each other(parameter exchange)via an efficient fabric.Enabling technologies include PCIe,CXL,HBM,UCIe,and photonics.Four x16 CXL 3.0 ports running at 64GT/s offer 1TB/s of aggregate peak bandwidth.A Photonics interconnect offers a degree of freedom on dist
5、ance and placement of compute and memory components while avoiding hop latency.Software/hardware codesign allows data to be present at the right xPU ahead of execution to keep the pipelines full.AbstractWhile focusing on AI/ML workloads,this presentation addresses a general-purpose system to serve m
6、ultiple use cases,algorithms,and frameworks.In response to the fact that Artificial Intelligent and Machine Learning(AI/ML)frameworks are in flux and the associated systems are rather complex and expensive.While the compute elements may be optimized for different algorithms,and software programmers