《ADOBE 的 DELTA LAKE 每天处理 1 万亿行数据.pdf》由会员分享,可在线阅读,更多相关《ADOBE 的 DELTA LAKE 每天处理 1 万亿行数据.pdf(20页珍藏版)》请在三个皮匠报告上搜索。
1、Trillion Rows per day powered by Delta Lake At AdobeYeshwanth VijayakumarDirector Of Engineering Adobe12024 Databricks Inc.All rights reservedBuilding over earlier talks in 2022,2023 and sharing new patterns2AgendaAbout The data?Scaling the Writer Data representation and nested schema evolutionStrin
2、gs FTW-data manipulation using UDFsTransaction Management and tracking Delta Using Delta2 phase commitsAppend-Only DeltaTablesto track global history across thousands of tablesMaintenance Operations and Their Scaling Gotchas2AgendaBuilding over earlier talks in 2022,2023 and sharing new patternsAbou
3、t The data?Scaling the Writer Thousand Stream problem-managing thousands of Structured Streaming writers at scaleJVM agnostic locking for partition level concurrency controlBalancing Multi Tenancy and Single TenancyTransaction Management and tracking Delta Using Delta2 phase commitsAppend-Only Delta
4、Tables to track global history across thousands of tablesData representation and nested schema evolutionStrings FTW-data manipulation using UDFsMaintenance Operations and Their Scaling GotchasUnified Profile Data IngestionUnified ProfileExperience Data ModelAdobe CampaignAEMAdobe AnalyticsAdobe AdCl
5、oudChange FeedStreaming Stats GenerationSingle TenantMulti TenantLinking IdentitiesComplexities?Nested Fieldsa.b.c.d*.e nested hairiness!Arrays!MapTypeEvery Tenant has a different Schema!Schema evolves constantlyFields can get deleted,updated.Multiple SourcesStreamingBatche2e-seg-2-20-20-02Scale?1-2
6、 Trillion Rows of changes a dayTenants have 10+Billions of rowsPBs of dataMillion RPS peak across the systemTriggers multiple downstream applicationsSegmentationActivationCDC(existing)Batch Ingestion/Streaming Ingestion/API based Ingest Mutation AppsHot StoreCDC1.Send Request to Cosmos2.Ack3.Emit CD