1、Use Apache Spark from Anywhere Remote Connectivity with Spark ConnectStefania Leone,Martin GrundSr.Manager Product Management,DatabricksSr.Staff Software Engineer,DatabricksProduct Safe Harbor StatementThis information is provided to outline Databricks general product direction and is for informatio
2、nal purposes only.Customers who purchase Databricks services should make their purchase decisions relying solely upon services,features,and functions that are currently available.Unreleased features or functionality described in forward-looking statements are subject to change at Databricks discreti
3、on and may not be delivered as planned or at all.Who develops with OSS Spark locally?Who develops with OSS Spark locally?What about the data?Who uses Apache Livy or JDBC to connect to Spark?Todays Developer experience requirementsBe close to data during development-Software engineering best practise
4、s-Interactive exploration-High production fidelity:develop&run close to dataBetter remote connectivity-From any application-From any languageHow to build on Spark?Up until Spark 3.4:Hard to support todays developer experience requirementsApplicationsIDEs/NotebooksProgramming Languages/SDKsNo JVM Int
5、erOpClose to REPLSQL onlySparks Monolith DriverApplication LogicAnalyzerOptimizerSchedulerDistributed Execution EngineModern data applicationApache Spark 3.4:Spark ConnectRemote Connectivity:thin client,full power of Apache SparkSpark Connect Client APISparks DriverApplication GatewayAnalyzerOptimiz
6、erSchedulerDistributed Execution EngineApplicationsIDEs/NotebooksProgramming Languages/SDKsModern data applicationSpark ServerClientAnalyzerOptimizerSchedulerDistributedExecution Enginespark.read.table(“logs”).select(“id”,extract_profile(“blob”).write.insertInto(“profiles”)InsertInto profiles+-Proje