《Ray on Apache Spark™ Apache Spark™上的Ray.pdf》由会员分享,可在线阅读,更多相关《Ray on Apache Spark™ Apache Spark™上的Ray.pdf(22页珍藏版)》请在三个皮匠报告上搜索。
1、Ray on SparkBen Wilson,DatabricksJiajun Yao,AnyscaleDatabricks2023Who we are Ben WilsonJiajun YaoWorks with ML open source software at DatabricksMLflow maintainerSoftware engineer at AnyscaleRay committerAgenda What is Ray What is Ray-on-Spark Why Ray-on-Spark How to use Ray-on-Spark Demos How does
2、Ray-on-Spark work Future workWhat is RayWhat is Ray An open-source unified distributed framework that makes it easy to scale AI and Python applications.An ecosystem of Python libraries(for scaling ML and more).Makes distributed computing easy and accessible to everyone.Runs on laptop,public cloud,K8
3、s,on-premise.Run anywhereGeneral-purpose framework for distributed computingLibrary+app ecosystemRay coreWhat is RayWhat is Ray def read_array(file):#read ndarray“a”#from“file”return adef add(a,b):return np.add(a,b)a=read_array(file1)b=read_array(file2)sum=add(a,b)class Counter(object):def _init_(se
4、lf):self.value=0 def inc(self):self.value+=1 return self.valuec=Counter()c.inc()c.inc()FunctionClassWhat is Ray ray.remote def read_array(file):#read ndarray“a”#from“file”return a ray.remotedef add(a,b):return np.add(a,b)a_ref=read_array.remote(file1)b_ref=read_array.remote(file2)sum_ref=add.remote(
5、a,b)sum=ray.get(sum_ref)ray.remoteclass Counter(object):def _init_(self):self.value=0 def inc(self):self.value+=1 return self.valuec=Counter.remote()c.inc.remote()c.inc.remote()Function-TaskClass-ActorWhat is RayHigh-level libraries that make scaling easy for both data scientists and ML engineers.Wh
6、at is Ray1,000+OrganizationsUsing Ray25,000+GitHubstars5,000+RepositoriesDepend on Ray820+CommunityContributorsWhat is Ray-on-Spark A library to deploy Ray clusters on Spark and run Ray applications.Ray coreWhy Ray-on-Spark User asksSpark users want to use both Spark MLlib and Ra