《从意大利面碗管道到 Lakeflow 声明式管道的效率.pdf》由会员分享,可在线阅读,更多相关《从意大利面碗管道到 Lakeflow 声明式管道的效率.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、From Spaghetti Bowl Pipeline to DLT EfficiencyPeter Jones|Intermountain HealthcareIntermountain Health by the NumbersConfidential and property of Intermountain Health34,600+employed Physicians&APPs33 HospitalsIncluding 1 Virtual Hospital66,000+Caregivers1.1 millionMembers400Clinics6 Primary States2(
2、UT,NV,ID,CO,MT,WY)$16.06billionTotal Revenue4,800Licensed Beds1Numbers reflect through year end,December 31,20232 Intermountain also provides air medical transport services in other states through Classic Air MedicalIntermountain Health Scope and History In Data4Who am I?-Former librarian-Analytics
3、engineer in healthcareoData engineeringoApp developmentoCloud infrastructure-Student/worker in medical informatics-Databricks account admin-Friend to chickensThis is what ChatGPT thinks I look like Who am I to argue?Simple,unsuspecting analytics engineerThe Spaghetti-bowl DropComplicationsWithin mon
4、ths of starting the effort,we would lose access to the logic in the original tool.Non-standard operations for things like joins,null handling,etc.At this same time,the original source for these pipelines was being actively decommissioned while we were working.Losing licenses in x monthsClickOps tool
5、 eccentricitiesSource being decommd8The solution-Handles references elegantly-Code ops vs.Click ops-Cluster management-Quick scalability-Job configuration options-Enables stream processing-Quick scalability-CICD integration-Code and resources together-Changes are versioned-Enables CICD processes-Ena
6、bles use of service accountsLakeFlow pipelines(aka DLT)LakeFlow jobsDatabricks Asset Bundles9LakeFlow+Asset BundlesLegacy ToolBenefits:-ClickOps interface-Tool nodes with configuration-Pull connections between toolsDrawbacks:-Proprietary formats and tools-More difficult collaboration-Sometimes non-s