《硬件 软件协同设计的建模方法和工具.pdf》由会员分享,可在线阅读,更多相关《硬件 软件协同设计的建模方法和工具.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、OCP Global Summit October 18,2023|San Jose,CATushar Krishna(Georgia Tech/MIT)Srinivas Sridharan(NVIDIA)Chakra and ASTRA-sim:An open-source ecosystem for advancing co-design for future AI systemsTrend 1:Large AI ModelsSource:https:/ 2:Large Training DatasetsComputeZeta-scale floating-point operations
2、Moores Law is dead cannot rely simply on processor scalingMemory10s of TB requiredMultiple Neural Processing Units(NPUs)required to fit model weightsCommunication10s of GBs of collectivesSystem ImplicationsTrend 3:Emerging HPC Systems for AIComponents of an AI SystemChallenge:Complex SW/HW Co-Design
3、 SpaceHow do we navigate this?Introducing Chakra and ASTRA-simChakra Execution Trace:an open graph-based representation of AI/ML workload executionASTRA-sim:Distributed AI system simulatorChakra for Workload RepresentationChakra Execution Trace:an open graph-based representation of AI/ML workload ex
4、ecutionChakra:MotivationKeeping pace with AI innovation and model updatesCovering wide range of use cases at production scale Isolating HW/SW bottlenecksIsolating compute,network,memory behavior Reproducing without support infrastructureObfuscating proprietary AI model details Gaps in benchmarks tod
5、ayHierarchical DAGNodesPrimitive operators:compute,comms,memoryTensor objects:shape,size,device(local/remote)Timing and resource constraintsEdges Data dependencyControl dependency(e.g.call stack)Higher-level abstractions(e.g.,components)Comprises of other components or primitive opsChakra Execution
6、TracesChakra Ecosystem and End-to-End FlowChakra is now part of MLCommons!Build consensus on Execution Trace methodologyEnable easier sharing between hyperscaler/cloud and vendors(with/without NDA)Vendors can focus on different components(compute/memory/network)Enable faster ramp