《使用 DLT 进行高效的近实时事件摄取:见解和经验教训.pdf》由会员分享,可在线阅读,更多相关《使用 DLT 进行高效的近实时事件摄取:见解和经验教训.pdf(25页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved1Efficient Near Efficient Near RealReal-Time Event Time Event Ingestion using Ingestion using DLT:Insights&DLT:Insights&LessonsLessonsKavinKavin-EngineernextdoorEngineernextdoor2024 Databricks Inc.All rights reservedNextdo
2、ors mission is to create a kinder world by connecting neighbors and real-world connectionsWe operate in the US,Canada,Europe&Australia today and have over 43 million weekly active usersWe run on AWS cloud todayWe are hiring!Apply https:/ Databricks Inc.All rights reservedApp is hosted in 4 AWS regio
3、nsUp to 400k events/secIncludes client(impression,click,etc.,)&server events(requests,ab_tests.,etc)3EventsEvents2024 Databricks Inc.All rights reserved4DLT adoption phasesDLT adoption phasesOverviewOverviewDevelopmentDevelopmentTuning Tuning&OptimizationOptimizationObservability Observability&Monit
4、oringMonitoringResultsResults2024 Databricks Inc.All rights reserved5OverviewOverviewNextdoors event ingestion pipeline pre DLTNextdoors event ingestion pipeline pre DLT2024 Databricks Inc.All rights reservedHttp service published events to one Kafka topic per regionKafka-connect application dumped
5、data to S3 bucket 1 minute intervalAWS Lambda partitioned the event and writes data to another S3 bucket with retryHourly job in Airflow ran to add the partitions in HiveMetastore6OverviewOverviewNextdoors event ingestion pipeline pre DLTNextdoors event ingestion pipeline pre DLT2024 Databricks Inc.
6、All rights reservedFunctionalEvents to reach Data Lake near real-time to enable quicker analysisAt most once instead of at least once event deliveryNon-functionalNo increase in compute and/or storage costInsights into ingestion7OverviewOverviewGoalsGoals2024 Databricks Inc.All rights reservedAfter d