《使用Lakehouse对抗癌症:Ontada在Databricks Lakehouse上建立RWD平台的旅程.pdf》由会员分享,可在线阅读,更多相关《使用Lakehouse对抗癌症:Ontada在Databricks Lakehouse上建立RWD平台的旅程.pdf(32页珍藏版)》请在三个皮匠报告上搜索。
1、USING LAKEHOUSE TO FIGHT CANCEROntadas Journey to Establish a Real-World Data(RWD)Platform on Databricks Lakehouse Databricks20231About The SpeakerDonghwa Kim is the Senior Director of Architecture at Ontada,a McKesson company.He is responsible for delivering the next generation Data and Analytics p
2、latform using Databricks Lakehouse.Prior to joining Ontada,Donghwa worked as an enterprise architect,migrating a large-scale on-prem data warehouse onto Databricks at Veterans Affairs(VA)and at Centers for Medicare and Medicaid Services(CMS).Donghwa has over 20 years of IT experience within the heal
3、thcare and finance industries.2AgendaRWD,RWE and Oncology02Oracle to Databricks Migration04Ontada Data Platform(DMI)03Benefits of Lakehouse05Ontada Introduction0131_DAIS_Title_SlideOntada Introduction4Ontada-IntroductionOntada,a McKesson business,is an oncology real-world data and evidence,clinical
4、education,and provider technology business dedicated to transforming the fight against cancer.Transforming the fight against cancerOur Provider Network1.4M+patients seen,including The US Oncology NetworkReal-World OncologyData&Insights2.4M+patient records available for researchMarket-LeadingProvider
5、 Technology2.6K+providers use iKnowMedSM5Reference:https:/ ProductsProvider Solutions6Comprehensive clinical support for patientsUpdated library of regimensDrug availabilityDiagnosis-level biomarker testingAnd more for the best clinical outcomesIntelligent matching of each patient to best evidence-b
6、ased treatment optionsIntegrated clinical and financial information at the point of needEnables patients to:View their health records and lab results Communicate with their providersProactive and actionable insights into:Quality initiativesValue-Based Care(VBC)programsPerformance metrics Productivit
7、y measuresPeer/industry benchmarks Premier Oncology EHRDecision Support ApplicationPatient PortalPerformance Analytics Tool iKnowMedClear Value PlusOntada HealthPractice InsightsOntada ProductsLife ScienceFit-for-purpose data products to support oncology decisions and research.Provides the clearest
8、view into the full cancer patient journey,including:Diagnosis,staging,and treatment decisionTherapy initiation and maintenance/disease progressionTreatment decision,adverse events,and survivorshipReal-World DataCustomized educational programs and unique channels to help providers achieve their comme
9、rcial goals:Provider educationPatient educationMarket researchProvider EngagementOntadas offering to help providers achieve outcomes-based value with:Real-World Evidence(RWE)Health Economics Outcomes Research(HEOR)Ontadas real-world data(RWD)and extensive oncology expertise to help providers to adva
10、nce clinical research,expand access,and improve outcomes.Real-World Research1_DAIS_Title_SlideRWD,RWE,and Cancer Care8U.S.Cancer Statistics 1,958,310 new cancer cases(2023 projection)609,820 cancer deaths(2023 projection)The second-leading cause of death after heart disease The leading cause of deat
11、h among women,40 to 79 years The leading cause of death among men,60 to 79 yearsThe Silver LiningOverall cancer mortality continues to decline33%decrease since 1991But we need to do more!Reference:https:/ and RWEReal-World Data(RWD)*“Data relating to patient health status and/or the delivery of heal
12、th care routinely collected from a variety of sources.”Real-World Evidence(RWE)*“The clinical evidence about the usage and potential benefits or risks of a medical product derived from the analysis of RWD.”Patients health and care journeysData from many sources(EHR,claims and billing,surveys,and pat
13、ient generated)RWDClinical evidenceBenefits/risks of a medical productRWEAnalysis10*Reference:Framework for FDAs Real-World Evidence Program.Accessed https:/www.fda.gov/media/120060/download11Real World DataReal World EvidenceCancer Drug Research&ApprovalRWD,RWE and Oncology”Each patients cancer jou
14、rney,strengthened by real-world data and evidence,paves the way for safe and effective prevention,detection,and treatment,turning personal battles into collective transformative victories.”-Sagran Moodley,Chief Innovation and Technology Officer for Ontada1_DAIS_Title_SlideOntada Data Platform(DMI)Mi
15、gration Journey12What is the Ontada DMI?The next generation analytics platform built on Azure Databricks13AgilityScalabilityGovernanceReplaces the legacy on-prem data warehouse and cloud-based genomic data pipeline infrastructure Provisions real-world data and evidence(RWD/E)generation to accelerate
16、 life science research,inform decisions at the point of care,and provide data-driven patient treatmentImproves end-to-end data capture,transformation,curation,standardization,and observabilityActs as a single access point for all Ontadas data and computation needs:ReportingData analysisAdvanced anal
17、yticsAd-hoc explorationsCommon Challenges For Data PlatformsData SilosComplex Access ControlResource ContentionDifficult Data GovernanceOperational OverheadData Disharmony14+(Without Lakehouse)(Unhappy Customers)Why Databricks?Complexity kills speedSimplified infrastructure and tools(batteries inclu
18、ded)Manual processesBuilt-in automated workflows and orchestrationData quality&consistencyDelta Lake+Unity CatalogNeed for ALL analyticsLakehouse15On-Prem&CloudAzure DatabricksOracle DBMongo DBSFTPS3 BucketBatchStreamingData SourceData IngestionBronze“Raw”data from internal and external data sources
19、DMI LakehouseSilverCommon Data Model(Silver)GoldCommon Data Model(Gold)Azure Data Lake Storage Gen 2Unity CatalogBI/SQLAdvanced AnalyticsBI DevelopersCDCData DeliveryConsumersData ScientistsData AnalystsData ExtractionSelf-Serve ReportingData StewardsUser ProvisioningAccess ControlBatch&StreamingOnt
20、ada LakehouseDelta SharingAutoloaderJobs ClustersGP ClustersSQL Warehouse+Other Data SourcesMaster Data ManagementCompliance MonitoringFinOpsAutomation16DMI Use Cases-Highlights17NeedSolutionHarmonize data from disparate sourcesAbstract information from unstructured dataExpand Clinico-Genomics data
21、product offeringsGenerate and share reports+dashboardsCommon Data ModelDriver for Real World Data EnterpriseOncology-focused,scalable,and interoperable data modelFHIR mCode and Genomics Reporting IGNLPMinimize the time and cost associated with chart abstractionBiomarker extraction from clinical note
22、sLeverage both commercial and open-source librariesGenomics Data ProcessingIngest,process and analyze raw genomics filesSelf-Serve ReportingEasy functionality to share aggregated data with external clients and self-service access to dashboards and reports.Allows to published data for supporting self
23、-serve reporting1_DAIS_Title_SlideOn-Prem DW to DatabricksMigration18Migration TimelineQ3 22Q4 22Q1 23Q2 23DMI VisionContract ExecutionData Engineering and NLP PilotDatabricks DeploymentLife Science migration DevelopmentDatabricks Professional ServiceData Validation FrameworkData IngestionCommon Dat
24、a Model DevelopmentDelivery of Real-world Enterprise Product to Market19Migration Lessons LearnedStart conversations with DBAs early03Set validation strategy05Beware of code conversion limitations04Monitor,early feedback and repeat0620Get executive sponsorship01Parallel execution02Lessons Learned-01
25、Get executive sponsorship01Financial CommitmentInitial and continuedProvide clarity on the total budget.Long term journey and investmentFrequent Vision and Execution AlignmentOntada Executive Sponsor&DMI Delivery TeamOntada Executive Sponsor&Databricks Industry Vertical LeadershipDMI Delivery Team&D
26、atabricks Account TeamDatabricks Business Value ConsultingExecutiveSponsorDeliveryTeamInd.VerticalLeadershipAccountTeamLessons Learned-02Parallel execution02Have a strategy for parallel executionCurrent workloadMigrationBe ready to juggleCompeting prioritiesYou cannot predict unknown unknownsRAIDRis
27、ks,Assumptions,Issues,and Dependencies(Seriously)Consider Databricks Professional Service or their SI partner ecosystem22RAIDDont let those risks encroach on you!Lessons Learned-03Start conversations with DBAs early03Workload scheduleUnderstand the schedule of existing jobs(competing for the resourc
28、e)Find slots for new ingestion job(s)Resource contention CPU,Memory,DiskEnsure current critical jobs are not impactedCDC requires supplemental logging turned onOn-prem disk space and costTesting data ingestion in QA environmentRunning production-like workloads(via data refresh)Need collaboration wit
29、h DBAs and QAs23Lessons Learned-04Beware of code conversion limitations04DDL GenerationManual generation of table schemaTime Zone ConversionAlign on proper time zone strategyHave early discussion with the product teamSession vs.Global ConfigurationOracle PL/SQL Procedures and FunctionsSpark SQL limi
30、tations24Lessons Learned-05Set validation strategy05Testing DataStatic Data T(Transformation)Pipeline ValidationCatalog-to-catalog match in DatabricksIterations and cycles-multiple catalogsLive Data (EL)Ingestion ValidationOracle-to-Databricks matchValidation Automation FrameworkLevel 1:Count and ch
31、ecksum validationLevel 2:Logic validation need to work with data usersImplement Quality check at every layer25Lessons Learned-06Monitor,send feedback and repeat06FinancialMonitor and prevent run-away costProvide early feedback on consumptionFrequently validate the usageLeverage FinOps PrinciplesComp
32、lianceWho,What,When,and WhyEstablish a matrix of user groups and data assetsDevelop an access approval processCreate regular compliance reports26User Access Control and ComplianceGetting Most Out of Unity CatalogAutomated User Provisioning using SCIMWindows AD-Azure AD-Databricks User GroupsIntegrat
33、e with ServiceNow workflowsFine Granular Access ControlAppropriate privileges on data objects based on group associationOngoing compliance report generation using Python SDKdatabricks_cli.groups.apidatabricks_cli.unity_catalog.api271_DAIS_Title_SlideBenefits of Lakehouse28The OutcomesDescriptionKey
34、ImpactsData AvailabilityFaster data ingestion from disparate data sourcesQuality data via automated data validation frameworkDelivery SpeedImproved speed to marketExpedited product development and deliveryAbility to run parallel workstreams via dedicated compute availability Enhanced User Experience
35、Exceptional performance improvement(10 x in certain use cases)User collaboration via notebook,GitHub integrationOne-stop shop Data Engineering,Data Science,ETL,SQL analyticsFuture ReadyReady to develop and deliver new LS productsAll data types(structured,semi-structured,un-structured),all analytics(descriptive,predictive,prescriptive)29Find Us on Databricks Marketplace30Search:OntadaDatabricks Data for Good Award Finalist31https:/