《数据湖上的数据仓库性能.pdf》由会员分享,可在线阅读,更多相关《数据湖上的数据仓库性能.pdf(20页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1DATA WAREHOUSE DATA WAREHOUSE PERFORMANCE ON THE PERFORMANCE ON THE DATA LAKEHOUSEDATA LAKEHOUSESida Shen,Product Manager,CelerDataSida Shen,Product Manager,CelerDataEric Sun,Head of Data Platform,CoinbaseEric Sun,Head of Data Platform,Coinbase2024 Databricks
2、Inc.All rights reserved2024 Databricks Inc.All rights reserved2Data LakehouseData LakehouseData warehouse on open&standardizedstandardized storage Unify batch and near-real-time workloads on single source of truth dataEasy data governance,simple architecture,flexibility,cost-effectivenessOpen&Standa
3、rdizedtable,file formatACID transaction propertiesSchema evolutionCompactionNear-real-time analyticsSQLComputeApplication2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved3THE REALITY?THE REALITY?2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserv
4、edNot optimized for high concurrency low latency workloadsStill on older technologies Not able to handle demanding analytics workloads such as customer-facing analyticsQuery engines are not fast enoughThe users turn to costly workarounds Query engines are not fast enoughThe users turn to costly work
5、arounds 4Over-engineering or overspending on their existing query engine for barely passable performance.Unsustainable nor future proof.Forced to move workloads to a proprietary data warehouse purely for query accelerationTHE REALITYTHE REALITYUsers are forced to copy data out of the lakehouseUsers
6、are forced to copy data out of the lakehouse42024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedCost of maintaining a proprietary data warehouseCost of data ingestionChallenges from matching schema,data type,SQL,etc.Data governance challenges from duplicating the data5THE