当前位置:首页 > 报告详情

为什么三角洲湖是熊猫分析的最佳存储格式.pdf

上传人: 2*** 编号:139142 2023-06-04 36页 4.12MB

1、Why Delta Lake is the best storage format for pandasMatthew Powers,CFADatabricks-Delta LakeDeveloper Advocate at DatabricksWorked in finance for 5 years before programmingRuby/Rails web dev=data engineeringLong time Spark blogger()Now blogging at delta.io/blogCreated multiple popular Spark open sour

2、ce projects(GitHub:MrPowers)Written 2 Spark booksMatthew Powers,CFA5 Reasons Delta Lake is Awesome for pandas1.File skipping allows for faster queries2.Time travel/versioned data3.Schema enforcement4.Better partition management5.Small file compaction&Z OrderingReason 1:File skipping makes queries run fasterReason 2:Time travel/versioned dataReason 3:schema enforcement prevents bad appendsReason 4:Better partition management(adding&deleting partitions)Reason 5:Small file compaction&vacuumThe Lakehouse architecture is great for pandas tooProblems with data lakesdelta.io/blog

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
客服
商务合作
小程序
服务号
折叠