《Lakeflow 在生产中的应用:CI CD、大规模测试和监控.pdf》由会员分享,可在线阅读,更多相关《Lakeflow 在生产中的应用:CI CD、大规模测试和监控.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、Lakeflow in Production:Testing,CI/CD,and MonitoringYour Data Pipelines at ScaleLennart KatsPrincipal EngineerDatabricksAdriana IspasSr.Staff Product ManagerDatabricksDatabricks R&D June 2025ETL PREPARES RAW DATA FOR ANALYTICS,REPORTING&AI/MLOperating data pipelines at scale can be complex and brittl
2、e Data WarehousingBusiness IntelligenceAI/MLData SharingEnterprise applicationsData LakeCSV,JSON,TXTKinesisAWS GlueKafkaLakeflow declarative pipelinesORCHESTRATETRANSFORMINGESTConnectPipelinesJobsSimple native ingestion connectorsReliable data pipelines made easyOrchestrate workflows across the enti
3、re Data Intelligence PlatformLAKEFLOW DECLARATIVE PIPELINESReduced development complexityRaw ingestion and historyBRONZEFiltered,cleaned,augmentedSILVERBusiness-level aggregatesGOLDCREATE STREAMING TABLE raw_dataAS SELECT*FROM cloud_files(/raw_data,json)CREATE MATERIALIZED VIEW clean_dataAS SELECT F
4、ROM raw_dataSQL or PythonEasier to reason aboutDeclare datasets in SQL/Python&the system automatically orchestrates the resulting DAGOptimized executionThe system handles changing data,parallel execution,and incremental processingEasier maintenanceAutomates complex activities like recovery&retries,a
5、uto-scaling,and performant executionLAKEFLOW PIPELINES EDITORSimplified pipeline authoringTailored to declarative codingPrimitives to organize,execute code and iterate on pipeline logic for the declarative paradigmModular&structured developmentWork on one dataset at a time,and iterate with data prev
6、iews,contextual errors&visual graphGuided and robust,yet flexible Guidance for getting started&accommodating organizations established practicesDevelopmentProductionPrivacyQualityPerformanceBut how do you take your data pipelines to production?DevelopmentProductionPrivacyQualityPerformanceMany aspec