优化批处理和流式聚合.pdf

编号:139024 PDF 28页 648.58KB 下载积分:VIP专享
下载报告请您先登录!

优化批处理和流式聚合.pdf

1、Jacek Laskowski/jaceklaskowskiOptimizing Batch and Streaming AggregationsData+AI Summit 2023About the SpeakerJacek Laskowski is a Freelance IT ConsultantSpecializing in Apache Spark,Delta Lake,Databricks,Apache Kafka(incl.Kafka Streams and ksqlDB)Best known by The Internals Of online booksContact me

2、 at jacekjapila.plFollow me at JacekLaskowskiConnect on LinkedInTable of Contents1.The Intro to The Internals of Structured Queries2.The Internals of Aggregate Queries3.Scala UDAFs and Aggregators4.Streaming Aggregates5.Streaming Aggregates Performance Tuning Gig6.Things to Watch Out For(Recap)The I

3、ntro toThe Internals ofStructured QueriesStructured Queries Apache Spark is a general-purpose distributed compute platform Spark SQL is a module of Apache Spark to describe batch queries over structured and semi-structured datasets(of any size)Spark Structured Streaming is a module of Apache Spark f

4、or streaming queries over unbounded data Queries are described using High-Level Query OperatorsDataFrame APISQL In most cases,optimizing streaming queries is to optimize corresponding batch queriesNo need to focus on streaming features(less to worry about)Caveat:streaming issues may really be relate

5、d to how streaming queries workHigh-Level Query Language-DataFrame APIHigh-Level Query Language-SQLQueryExecutionQueryExecution is the execution pipeline(workflow)of a structured queryMade up of execution phasesLogical and Physical OperatorsLogical Operators are building blocks of logical query plan

6、s in Spark SQLAggregateJoinLocalRelationLogicalRDDMergeIntoTableProjectSortPhysical Operators are executable nodes of physical query plans in Spark SQLAdaptiveSparkPlanExecBroadcastHashJoinExecHashAggregateExecObjectHashAggregateExecProjectExecSortAggregateExecThe Internals of Aggregate QueriesAggre

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(优化批处理和流式聚合.pdf)为本站 (2200) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠