《APACHE XTABLE(孵化):湖仓一体表格式之间的互操作性.pdf》由会员分享,可在线阅读,更多相关《APACHE XTABLE(孵化):湖仓一体表格式之间的互操作性.pdf(28页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1APACHE XTABLE:APACHE XTABLE:INTEROPERABILITY INTEROPERABILITY BETWEEN OPEN BETWEEN OPEN TABLE FORMATSTABLE FORMATSDipankar Mazumdar&Kyle Weller,Onehouse.aiDipankar Mazumdar&Kyle Weller,Onehouse.ai1313th June th June 202420242024 Databricks Inc.All rights reser
2、vedStaff Developer Advocate Onehouse.aiContributor Apache Hudi,XTablePrev:Data Architecture,Visualization,MLHead of Product Onehouse.aiPrev:Product Manager-Azure Databricks,Azure ML,Bing Searchin/dipankarmazumdar/dipankartntDipankar MazumdarSPEAKER BIOSin/lakehouse/2Kyle WellerKyleJWeller2024 Databr
3、icks Inc.All rights reservedData Lakesor2024 Databricks Inc.All rights reservedhttps:/dsf.berkeley.edu/papers/fntdb07-architecture.pdfArchitecture of a Database SystemS3 Data Lake Storage2024 Databricks Inc.All rights reservedDATA LAKEHOUSE-UNBUNDLING OF THE DBMS2024 Databricks Inc.All rights reserv
4、ed201920192018201820172017open sourced open sourced open sourced ORIGIN STORIES2024 Databricks Inc.All rights reserved(/4 4!+%|4B?S;L?C P?L A?HN vTechnical vision and goals are divergentThe community needs are specializedAll three projects are on fast growth trajectoriesNew table formats are gaining
5、 traction:Apache Paimon,YOHB?2024 Databricks Inc.All rights reservedTECHNICAL FUNDAMENTALSMetadata abstractions on files in cloud object storageTables with SQL semantics and schema evolutionACID transactionsUpdates and deletes(merge/upsert)Data layout optimizations for performance tuning 2024 Databr
6、icks Inc.All rights reservedHOW IT LOOKS ON CLOUD STORAGE Fundamentals of table formats Hudi,Delta,Iceberg are not that different Each has a special metadata layer on top of parquet files2024 Databricks Inc.All rights reservedChooseChooseif:if:1.Mutable data-GDPR Deletes,Updates2.CDC workloads3.Low