1、The Hitchhikers Guide to Delta Lake Streaming Tristen WentlingSr.Solutions ArchitectDelta OSS Contributor2023Scott HainesDistinguished Software EngineerSpark/Delta OSS Contributorhttps:/bit.ly/dais2023_hgdlsIntroductionsScott HainesApache Spark journey began in 2016Delta Lake journey began in 2019No
2、minated to the Databricks BeaconsPublished First Big Book on Apache Spark in 2022.Working on First Big Book on Delta LakeLove Learning,Teaching,and MentorshipSpends his days working at one of the worlds largest apparel and shoe companies Enjoys Growing,Making,and Consuming Hot Sauce Loves his kick-a
3、ss wife and two dogsIntroductionsTristen WentlingApache Spark Journey began in 2017 Published blogs on streaming with Spark Working on First Big Book on Delta Lake Reformed Data Scientist Currently works at helping customers create solutions in the retail industry Spends too much time online gaming
4、Somewhat obsessed with palms and other tropical treesPreparing for our JourneyWhat we will probably get through today Part 1:A Gentle Introduction to Stream Processing with Delta Lake Part 2:The Hitchhikers Guide to Delta Lake Streaming Part 3:And One More Thinghttps:/bit.ly/hitchhikers-guide-to-dls
5、https:/bit.ly/dldgv2“Slide Deck Budget Cuts Begin Now!”1_DAIS_Title_SlidePart 1:Delta Lake Streaming 101First Steps:Delta LakeWhat is the Difference between Batch and Stream Processing?Batch processing can be considered as taking the incremental workloads and handling them in larger groups.The bound
6、aries are mostly semantic and the methods differ primarily in terms of latency.First Steps:Delta LakeWhat is the Difference between Batch and Stream Processing?Batch-Periodic processesScheduled(every day/week/month/)expecting some finite set of data;(mon for ETL where files arrive on a scheduled int