《2020Flink峰会报告:【02】廖嘉逸-Single Task Recovery and Regional Checkpoint.pdf》由会员分享,可在线阅读,更多相关《2020Flink峰会报告:【02】廖嘉逸-Single Task Recovery and Regional Checkpoint.pdf(36页珍藏版)》请在三个皮匠报告上搜索。
1、Single Task ecovery and egional Checkpoint 廖嘉逸 字节跳动基础架构工程师 R3gional &h31kpoint T制 #2 单点恢MT制 #1 Intro2u1tion o4 Singl3 Task R31oB3rC M31hanism Intro2u1tion o4 R3gional &h31kpoint M31hanism 字节跳I在 &h31kpoint WFO优化 # R战 S来规划 #4 Oth3r OptimiDations on &h31kpoint at BCt3Dan13 &hall3ng3 Futur3 Work 单点恢复机制
2、# I1troductio1 of Si1gle Task #ecovery Mecha1ism G务ka Qe )oin g拓扑 er大(30M QPS)、并发数t 16k * 16k) Joh时s内小部M数据丢S T数据pLg持j性mdt Topology of mBltipl3 str30ms join *0rg3 tr0ffi1 0nd 6ig6 p0r0ll3lism Tol3r0n13 on p0rti0l loss of d0t0 in 0 s6ort p3riod (B0r0nt33 for 1ontinBoBs oBtpBt 01kgroBnd of Bsin3ss3s l为
3、日志展f日志 )oin3r 服务 cPni w题 & 思r 思r 使f &nd5v5dual +ask Fa5lover ig替代 )eg5on Fa5lover ig y故障 +ask 可以bUtn并PCdeF数据 w题 N +ask R败I引起全Tu启,l时分vk 5+ m5n ),Cd断c 全Tu启ISm业务h Cache 数据丢R,u新o算h成as高 S5ngle task fa5lure causes full restart wh5ch may last a few m5nutes Cache 5n tasks 5s lost and needs to be recalculated Use &nd5v5dual +ask Fa5lover Strategy 5nstead of )eg5on Fa5lover Strategy Normal tasks can st5ll s