结构化流式处理：揭开任意有状态操作的神秘面纱.pdf

上传人： 2***

编号：139156

2023-06-04

PDF 48页 2.94MB

《结构化流式处理：揭开任意有状态操作的神秘面纱.pdf》由会员分享，可在线阅读，更多相关《结构化流式处理：揭开任意有状态操作的神秘面纱.pdf（48页珍藏版）》请在三个皮匠报告上搜索。

1、Demystifying Arbitrary Stateful OperationsTaking the fear out of flatMapGroupsWithStateand applyInPandasWithStateAngela Chu,Databricks Sr.Solution Architect/Streaming SMEDatabricks2023My source data was already in the exact format I needed,and my consumers business requirements were all simple and s

2、traightforward to implement.-NobodyWhat well coverWhat are Arbitrary Stateful Operations?Summary of a real-world use caseThinking about flatMapGroupsWithState/applyInPandasWithStatelogicBreaking down the logic into its components data structures,initialization logic,steady-state logic and timeoutsIn

3、tegrate into your Spark Structured StreamStreaming best practices that applyAdditional things to keep in mindArbitrary Stateful OperationsSubstitute the word“Arbitrary”with“Custom”.Spark Structured Streaming has many built-in stateful operations(windowed aggregations,deduplication,etc)where with the

4、 use of watermarks the system automatically stores and manages data in-between microbatches.This data is known as the state.There are use cases that require more complex logic than is available in the built-in operations.In this case,users must be able to define custom state handling code to control

5、 what is stored in-between microbatches and how results are computed.These are arbitrary stateful operations.What are they?Customer Use CaseWhen a transaction record is received for a user,the count of transactions for that user that occurred within the last 5 minutes of the transaction time is calc

6、ulated and written to a table.Only the current count for a given user is kept.If no transactions are received for a user,the count should automatically go down.An ML model uses the count to determine if too many have occurred within the last 5 minutes.Transaction count within the last 5 minutesCusto

结构化流式处理：揭开任意有状态操作的神秘面纱.pdf

相关报告