1、 全球敏捷运维峰会 广州站PUBLICBig Data Intelligent Processing&Data Visualization演讲人:吴仕橹 全球敏捷运维峰会 广州站PUBLICBusiness Insights&Analytics How it Works123456781)Source systems are ingested into staging(a shared preparation area).Typically through Sqoop(database copy)or CDC(streaming style change updates)or Juniper(
2、in the house platform)2)System tables are copied into the Discovery environment,where this production data is processed and models/insight are developed post Data Factory3)The Data Factory takes raw data through a number of steps:i.Profiling:looking at the data to identify its contents and tag it wi
3、th the correct metadataii.Cleansing&curating:restructuring the data into the simplest and most efficient form,highlighting errors to revert back to source system ownersiii.Enriching:creating new derived fields based on the raw data(e.g.flags)and appending reference data for models to utiliseiv.Recor
4、d linking:using advanced techniques to join up disparate data and masses of separate sources into a single logical modelv.Indexing:organising the final data asset into an index,making it quickly searchable4)Stabilised assets and models are pushed through our UAT environment for testing and data vali
5、dation from the consuming users5)Final models and assets are then landed in our production environment;their insight ready for consumption through agreed patterns(typically APIs or file transfers)6)The Data Guardian will control all consumption compliance7)Data Exchange hosts APIs/APPs to source dat
6、a to consumers 全球敏捷运维峰会 广州站PUBLICData&Analytics ExecutionAutomated feed of data,copying the source systems into the GBM Data&Analytics LakeData is pre-processed,transformed and optimised by Data EngineersThe tagged data is linked and enriched using machine learning,generating uni