1、The Upcoming Apache Spark 4.1:The Next Chapter in Unified AnalyticsDB Tsai(Github:dbtsai)Xiao Li(Github:gatorsmile)2025-06Spark Team at Apache Spark Committer and PMC members About UsXiao Li(Github:gatorsmile)DB Tsai(Github:dbtsai)Spark 4.0Released this May5000 Resolved JIRAs400 Contributors50,000 C
2、ommits!880,000 Comments!45Distribution of PR CreatorsCountriesCountriesand Regionsand RegionsPyPI downloads of PySparkPyPI downloads of PySparkin the last 12 monthsin the last 12 months8Unified engine across data sources,workloads and environments9Apache Spark K8S OperatorA subproject of Apache Spar
3、kSpark K8S 0.3Extend K8s resource manager to manage Apache Spark applications and clusters via Operator Pattern10New Python Data Source APIsAI-generated customized data sourcesApache Spark 4.0Python UsersApache Spark 4.0Native Plotting with PlotlyPython DataSource APItransformWithState APIapplyInArr
4、owDF.toArrowLightweightPython ClientDeclarative PipelineTemp TableUDF DebuggingReal-time StreamingOptimized Python Data Source APIApache Spark 4.1SQL UsersApache Spark 4.0ANSI SQL Mode By DefaultDeclarative PipelineSQL UDF/UDTFPIPE SyntaxLanguage-Asware CollationsDynamic Session VariablesSQL Scripti
5、ng GASQL Stored ProcedureRecursive CTENew Data TypeApache Spark 4.113New Spark Connect Clients in any language!Apache Spark 4.0Major FeaturesSpark ConnectMoreStreaming&ConnectorsUsabilityScripting&UDFsPySpark UDF Unified ProfilingState Data Source Reader Variant Data TypesSpark K8S operatorSQL Scrip
6、tingStructured LoggingANSI Mode DefaultArbitrary Stateful Processing V2Arrow optimized Python UDFXML ConnectorJava 21 Error ContextSQLStatePolymorphic Python UDTFScala ClientApache Spark 4.0Compatibility applyInArrowDF.toArrowSQL UDF/UDTFPIPE SyntaxSwift,Go,Rust clientsNative PlottingSpark MLPython