《05-Flame - an intelligent distributed engine for AI and Quant-Klaus Ma.pdf》由会员分享,可在线阅读,更多相关《05-Flame - an intelligent distributed engine for AI and Quant-Klaus Ma.pdf(13页珍藏版)》请在三个皮匠报告上搜索。
1、Flame:A distributed system for intelligent workloadKlaus Ma(k82cn,Nvidia)目录Why Flame?01Architecture03Use Cases02ContentRoadmap04Why Flame?Currently,more and more frameworks are introduced for specific domain,e.g.,BigData,AI;but the meta operations are similar,e.g.,matrix,gradient,map/reduce.Meanwhil
2、e,no meta framework for specific distributed system use cases,e.g.,Monte Carlo,Crawler,Encoder.Flame is a distributed engine for the frameworks in domain;the Frame focus on the meta-API,performance,throughput,scheduling and high availability.Exploring a new way of data sharing/exchangingExample:Pi b
3、y Monte CarloPi Client:Pi Service:1.Create a callback for each tasks2.Create tasks based on input3.Waiting for all tasks completion4.Print the estimation of Pi1.Generate random points by input2.Print how many points are in the cycleExample:Matrix MultiplicationMatrix Client:Matrix Service:1.Create a
4、 callback for each tasks2.Create tasks based on input3.Waiting for all tasks completion4.Print all items of the matrix1.Calculate Cij for as the resultOverall ArchitectureFlame SDK(core)Flame Python SDK(on-going)Session ManagerExecutor ManagerRoadmapgRPC shim for all language,e.g.Python,R,C+(Done)Di
5、stributed storage/cache for data sharing/exchangingPyTorch,Tensorflow examples,e.g.distributed trainingTLS/mTLS enhancement for all connectionMore scheduling policy,e.g.priority,minService/maxService,data aware schedulingResource manager integration,e.g.KubernetesMisc,e.g.CLI,matrics,dashboardflame-operator to simplify operations Documentation and tutorialReferences flame-sh/flame:A distributed system for intelligent workload()Monte Carlo method Wikipedia Matrix multiplication-WikipediaThanks.