1、Trillion Rows per day powered by Delta Lake At AdobeYeshwanth VijayakumarDirector Of Engineering Adobe12024 Databricks Inc.All rights reservedBuilding over earlier talks in 2022,2023 and sharing new patterns2AgendaAbout The data?Scaling the Writer Data representation and nested schema evolutionStrin
2、gs FTW-data manipulation using UDFsTransaction Management and tracking Delta Using Delta2 phase commitsAppend-Only DeltaTablesto track global history across thousands of tablesMaintenance Operations and Their Scaling Gotchas2AgendaBuilding over earlier talks in 2022,2023 and sharing new patternsAbou
3、t The data?Scaling the Writer Thousand Stream problem-managing thousands of Structured Streaming writers at scaleJVM agnostic locking for partition level concurrency controlBalancing Multi Tenancy and Single TenancyTransaction Management and tracking Delta Using Delta2 phase commitsAppend-Only Delta
4、Tables to track global history across thousands of tablesData representation and nested schema evolutionStrings FTW-data manipulation using UDFsMaintenance Operations and Their Scaling GotchasUnified Profile Data IngestionUnified ProfileExperience Data ModelAdobe CampaignAEMAdobe AnalyticsAdobe AdCl
5、oudChange FeedStreaming Stats GenerationSingle TenantMulti TenantLinking IdentitiesComplexities?Nested Fieldsa.b.c.d*.e nested hairiness!Arrays!MapTypeEvery Tenant has a different Schema!Schema evolves constantlyFields can get deleted,updated.Multiple SourcesStreamingBatche2e-seg-2-20-20-02Scale?1-2
6、 Trillion Rows of changes a dayTenants have 10+Billions of rowsPBs of dataMillion RPS peak across the systemTriggers multiple downstream applicationsSegmentationActivationCDC(existing)Batch Ingestion/Streaming Ingestion/API based Ingest Mutation AppsHot StoreCDC1.Send Request to Cosmos2.Ack3.Emit CD