《使用 DMS 和 DLT 捕获变更数据.pdf》由会员分享,可在线阅读,更多相关《使用 DMS 和 DLT 捕获变更数据.pdf(28页珍藏版)》请在三个皮匠报告上搜索。
1、CDC with DLT and AWS DMSNeil Patel,Lead SSAGanesh Chand,Lead SSAContentsChange Data CaptureAWS DMS(CDC Tool)Databricks Delta Live TableBringing it all Together3Change Data CaptureChange Data CaptureWhat is Change Data Capture?Capture changes from a set of data sourcesOPEMPLOYEE_IDSALARYCDC_TIMESTAMP
2、I1100002018-01-01 16:02:00U1110002019-01-01 16:02:01D1110002019-01-01 16:02:01I2200002019-01-10 16:02:00I3300002020-01-01 16:02:00Change Data CaptureWhy CDC?Data Replication for BI and AIData IntegrationEvent-Driven ArchitecturesRegulatory Compliance and Security AuditsChange Data CaptureJDBCReachin
3、g into production DB to grab large swathes of data is generally not allowed just for a ETL processYou will need to keep track of what was last readReason about Updates and DeletesSnapshotsDealing with daily snapshots of a database is costly and time consumingSame as above you will need to reason abo
4、ut Updates/DeletesDelta Live Table does provide a way to handle that now(more on that later)How not to do itHow do we do CDC with Transaction Log?Change Data CaptureTransaction Log(aka Binlog for mysql,WAL for Postgres,etc)Written to log file and contains some of the followingSql statements Data tha
5、t changed(CDC)Who ran queryParse and read the transaction log for the change data related records,often CDC tools are used for thisChange Data CaptureCloud Vendors Native ServicesAWS DMS,Azure Data Factory,GCP DatastreamSAAS VendorsFivetran,Arcion,Informatica,Oracle GoldenGate,Talend,Qlik Replicate,
6、StreamSets,IBM InfosphereOpen SourceDebeziumCDC Tools9AWS Database Migration Service(DMS)AWS DMS(CDC Tool)Since the Transaction Log contains a variety of information it needs to parsed and filtered for data that will ultimately give us this outputWhat is CDC Tool?OPEMPLOYEE_IDSALARYCDC_TIMESTAMPI110