《学习使用 Databricks 在湖仓一体中应用数据质量的实用技术(重复).pdf》由会员分享,可在线阅读,更多相关《学习使用 Databricks 在湖仓一体中应用数据质量的实用技术(重复).pdf(55页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reservedPractical techniques Practical techniques for applying data for applying data quality in the quality in the lakehouselakehouseLiping Huang&Lara RachidiLiping Huang&Lara Rachidi11 June 202411 June 202412024 Databricks Inc.All rights reservedMeet the SpeakersMeet
2、 the Speakers2024 Databricks Inc.All rights reservedAgendaAgenda Six Dimensions of Data Quality Six Dimensions of Data Quality Data Quality Management LifecycleData Quality Management Lifecycle CrawlCrawl WalkWalk RunRun Example Medallion ArchitectureExample Medallion ArchitectureSix dimensions mode
3、lSix dimensions model4Dimensions of Data QualityDimensions of Data QualityConsistencyAccuracyValidityCompletenessUniquenessTimeliness2024 Databricks Inc.All rights reserved5Data Quality Management Data Quality Management LifecycleLifecycleData Quality Management LifecycleData Quality Management Life
4、cycleDiscovery2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved7CrawlCrawlData Quality Management LifecycleData Quality Management LifecycleDiscovery2024 Databricks Inc.All rights reserved9Discovery2024 Databricks Inc.All rights reservedDiscoveryKey StakeholdersKey Stake
5、holdersRequirement Gathering TechniquesRequirement Gathering Techniques10Data quality is a team sport Data quality is a team sport 2024 Databricks Inc.All rights reserved11Rule Setting,Cleansing&Standardization2024 Databricks Inc.All rights reserved12Data Quality Rule SettingData Quality Rule Settin
6、gDifferent rules inform the requirements for six dimensionsDifferent rules inform the requirements for six dimensionsConsistencyAccuracyValidityCompletenessUniquenessTimelinessBusiness RulesSemantic RulesIndustry RulesCompliance Rules2024 Databricks Inc.All rights reservedDetect inconsistenciesCause