《使用 DELTA LAKE 大幅降低处理成本.pdf》由会员分享,可在线阅读,更多相关《使用 DELTA LAKE 大幅降低处理成本.pdf(23页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1DRASTICALLY DRASTICALLY REDUCING REDUCING PROCESSING COSTS PROCESSING COSTS WITH DELTA LAKEWITH DELTA LAKEGeneroso Pagano&Mauricio JostGeneroso Pagano&Mauricio Jost1010-13 June 202413 June 2024 Amadeus IT Group and its affiliates and subsidiaries Amadeus IT Gr
2、oup and its affiliates and subsidiaries2About usGeneroso Pagano Principal Data Engineers Amadeus Mostly having fun with Scala,Spark and Delta LakeMauricio Jost Amadeus IT Group and its affiliates and subsidiaries Amadeus IT Group and its affiliates and subsidiaries3Making travel simpler,smarter and
3、smoother.Online travel agencies Travel agencies Travel management companies Metasearch Tour operators Media players Others Strategic Alliances and Partners Amadeus IT Group and its affiliates and subsidiaries4Our product 100s of output tables Several years of historical data History consolidationA c
4、omplex applicationChallenging requirements Join/merge intensive 1000s of Spark jobs Amadeus IT Group and its affiliates and subsidiaries5Our cost reduction journeyM4:photon,dvM5:revised history consolidationcost:1%M3:z-order,dfpM2:addressed thread contentionPilot daily cost(%)MilestonesM1:baselineco
5、st:100%-99%Amadeus IT Group and its affiliates and subsidiariesM1:Baseline(beginning of our journey)Functional correctness Technical stability Throughput below expectations CPU usage below 10%6Journey Tracker#1M4M5M3M2M1CostMilestones Amadeus IT Group and its affiliates and subsidiaries7Why is CPU u
6、sage so low?But.JSON parsing is CPU intensive!Post-mortem Spark UI Spark job not retained in UI Unnamed jobs Little workers information Live Spark UI What are workers doing?Most task threads BLOCKED Thread contention(shared lock)Task Threads in Worker JVM.Amadeus IT Group and its affiliates and subs