1、2024 Databricks Inc.All rights reservedRohin Bhasin&Caroline ChenRohin Bhasin&Caroline ChenJune 11,2024June 11,20241RUNHOUSE:RUNHOUSE:A PYTORCH APPROACH TO A PYTORCH APPROACH TO ML INFRAML INFRA2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved2MODERN ML INFRA MODERN ML I
2、NFRA IS FRAGMENTEDIS FRAGMENTEDWorkflows are inflexible,duplicative,and unreproducible.2024 Databricks Inc.All rights reserved3RESEARCH&RESEARCH&EXPERIMENTATIONEXPERIMENTATIONNotebooks,sandboxes,toy environmentsfast iterationhigh debuggability no powerful compute no collaboration w/teamPRODUCTIONPRO
3、DUCTIONDAGs,orchestrators,containerspowerful computereliable,stable environments poor debuggability over packaged inflexible across infra typesTRANSLATION&TRANSLATION&PACKAGINGPACKAGING 1-4 month process translating code for specific infra,learning DSLs packaging code into DAGs containerization of e
4、nvironment2024 Databricks Inc.All rights reserved4ISSUES STEM FROM FRAGMENTATIONISSUES STEM FROM FRAGMENTATIONNo Infrastructure No Infrastructure FlexibilityFlexibilityNo E2E Management No E2E Management&Visibility&VisibilityNo ML No ML FlywheelFlywheelSlow iteration speedDuplication everywhereBlock
5、s scaling and cost optimizationMigrations lead to infra lock-inMonitoring,control,and allocation infra specific and fragmented 2024 Databricks Inc.All rights reserved5WHAT ML DEVELOPMENT SHOULD LOOK LIKEWHAT ML DEVELOPMENT SHOULD LOOK LIKEHigh Iteration SpeedHigh Iteration SpeedNo excessive builds d
6、uring dev workAs smooth as developing locallyCentral ControlCentral ControlSingle control plane for resource visibility and managementLineage tracking and governanceMultiplayerMultiplayerReusable compute and servicesReproducible behaviorShareableInfra AgnosticInfra AgnosticNo migrations and DSLsFlex