《陌生人的胜利:Netflix 的 Spark 自动化升级和迁移.pdf》由会员分享,可在线阅读,更多相关《陌生人的胜利:Netflix 的 Spark 自动化升级和迁移.pdf(56页珍藏版)》请在三个皮匠报告上搜索。
1、Stranger Triumphs:Stranger Triumphs:Automating Spark Upgrades Automating Spark Upgrades and Migrations at Netflixand Migrations at NetflixDatabricks AI Summit 2024Holden KarauBobby MorckWe have unsupported versions of Spark in productionWhen things go wrong,I dont remember what we did 5 months ago l
2、et alone 5 years agoThey often seem to go wrong when we are trying to focus or sleepSpark 2 is very much EOLd,Spark 4 is coming soonOur Our ProblemsProblemsAPIs changes and code breaksKeeping code up to date is not a lot of funBackporting is not funCandy is more fun than taxes*Testing data pipelines
3、 well is hardSome of our data pipelines can have real world impacts when they go wrongWhy do we Why do we have these have these problems?problems?Software:Automated Code Update Tools(Abstract Syntax Tree(AST)transforms,or regexes both are fine)Generated TestsAutomated Testing and ValidationSocial:In
4、crease visibility of out of date code&change incentivesHow can we work around our problem?How can we work around our problem?Ok social first:Ok social first:People are way harder than computersWe gave a deadline(and slipped)like a normal projectCreated visibilityFound org championsOk social first:Ok
5、 social first:And now onto computers:And now onto computers:API changes(and updating your code)is annoying we can automate some of thatTesting code you inherited is a nightmare,we can sort-of-kind-of fake some of that(enough*)Im on the Spark PMC(like tenure:p)Worked on Spark for 15 yearsCo-author of
6、 Learning Spark(1st ed),High Performance Spark(1st ed and working on 2nd ed)Twitter:holdenkarau,bluesky ,mastodon holdentech.lgbtOOS Livestreams:https:/ https:/ Outside of work:Queer,Trans,Motorcycles,My DogHolden Holden KarauKarauEngineer on the Big Data Compute Team at NetflixFocus on Spark,Hadoop