《大规模数据生产力.pdf》由会员分享,可在线阅读,更多相关《大规模数据生产力.pdf(61页珍藏版)》请在三个皮匠报告上搜索。
1、Data Productivity at ScaleApril 18,2024Copyright 2024 Tobiko Data,Inc.About MeAbout MeCoCo-Founder&Chief Architect at Founder&Chief Architect at Tobiko DataTobiko DataLeading the development of Leading the development of SQLMeshSQLMeshOver 10 years of experience in Over 10 years of experience in dat
2、a/ML infradata/ML infraPreviously Netflix,ApplePreviously Netflix,AppleWhy am I here?How do we test our pipeline changes today?Why am I here?Environments!Why am I here?Creating environments for data is cumbersomeWhy am I here?Populating new environments with data is inefficientWhy am I here?Developm
3、ent iterations are slowWhy am I here?Development iterations are slowWhy am I here?Production deployments are ad-hoc and error-proneWhy am I here?ScaleScale Organizational ComplexityThree Pillars of Data ProductivityFast,Safe&Cost-Effective development processAutomatic Data ContractsADCADCVirtualData
4、EnvsVDEVDEData VersioningDVDVModel:T in ETLSELECT a,b FROM source WHERE c 0ModelSELECT a,b FROM source WHERE c 0executionModel SnapshotSELECT a,b FROM source WHERE c 0Model SnapshotModel Fingerprint(Hash)Snapshots in the Data WarehouseSnapshots in the Data WarehousePillar of Data VersioningFast,Safe
5、&Cost-Effective development processAutomatic Data ContractsADCADCVirtualDevEnvsVDEVDEDVDVTable perSnapshotModelSnapshotModelFingerprintVirtual Data EnvironmentsVirtual Data EnvironmentsVirtual Data EnvironmentsVirtual Data Environments:Flow1.Starting production environmentVirtual Data Environments:F
6、low2.Create development as a mirror of productionVirtual Data Environments:Flow3.Make a change to Model AVirtual Data Environments:Flow3.Make a change to Model AVirtual Data Environments:Flow4.Deploy model to productionIts nice and all,but what ifThe table is too large and costly