《Lakehouse 平台采用 Iceberg 表格式并实现统一元数据目录.pdf》由会员分享,可在线阅读,更多相关《Lakehouse 平台采用 Iceberg 表格式并实现统一元数据目录.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse PlatformRuotian WangSergey ZavgorodniSergey ZavgorodniSergey ZavgorodniLead Data Engineer DoordashRuotian WangRuotian WangSoftware Engineer DoordashSpeakerSpeakerAgenda4%U uyvj|i p vuz&PvvyPgzo Pgg V v|yuk4.4.Prac
2、tical ExplanationPractical Explanation3.3.Iceberg:Write Once,Read EverywhereIceberg:Write Once,Read Everywhere6.6.Q&AQ&A)Xkzzvuz Xkgyukj4DoorDashDoorDashis a technology company on a mission is a technology company on a mission to empower local economies by connecting people with the to empower local
3、 economies by connecting people with the best in their cities.Our vision is to connect every local best in their cities.Our vision is to connect every local business to every local consumer,and bring them things in business to every local consumer,and bring them things in minutes not days.minutes no
4、t days.%U uyvj|i p vuz5Foundation Data Foundation Data DoorDash DoorDash VisionVisionAll product and business decisions are data-drivenData ScaleData ScaleHundreds of PBHundreds of PBin Snowflake,Delta Lake and IcebergWorkloadWorkloadServing tens of millionstens of millionsqueries&spark jobs on a da
5、ily basis1.Introductions62.DoorDash Data JourneyBefore 2022:Business in Scattered WorldBefore 2022:Business in Scattered WorldYp i yvzkyp i k PNzQ ygi XvgjData WarehouseTransformTransformAnalyticsAnalytics7Training/InferenceTraining/InferenceSpark ComputesYXRkg|yk Qunp ukkyp un2.DoorDash Data Journe
6、y20222022-2024:Delta Lake Expansion2024:Delta Lake Expansion8Pgg c gykov|zk(Accounting,PA,3rd party vendors,etc)Data SharingsStreamingBatch ETLsGood Good:CostCost-effective for largeeffective for large-scale operation scale operation in Data Lakein Data LakeUnlock wider range of data use cases Unloc