《从雪花到企业级 Apache Spark™.pdf》由会员分享,可在线阅读,更多相关《从雪花到企业级 Apache Spark™.pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、From Snowflake To Enterprise-Scale Apache SparkNic Jansma+Amir SkovronikAkamaiDatabricks2023From Snowflake To Enterprise-Scale Apache SparkNic JSr.Principal Lead Engineer(mPulse)Amir SDistinguished Engineer(Asgard)1_DAIS_Title_SlidemPulse:Real User MonitoringWhat is mPulse?Real User Monitoring(RUM)m
2、Pulse provides real-time user experience and performance analytics,and maps those results to business goals and outcomes.4Scale 2 billion beacons/day(no sampling!)Real-Time(aggregate)dashboards:User experiences are reflected within 5-10s7 TB raw data/dayWaterfall(individual)dashboards:Full debug tra
3、ce of every page load+beacon available within 5 minutes4 TB raw logs/day5Scale13 months retention50 fact/dimension tables1 T rows1 PB storage60 QPS6Goals of MigrationEarly Snowflake adopter but needs have changedHighest cloud cost for mPulse($10m/year)New Akamai internal team(Asgard)dedicated to pro
4、viding a data warehouse solution for all of AkamaimPulse was to be one of the first large products to transition to AsgardUnique technical challengesEqual-or-better performanceCustomers shouldnt notice a difference7ChallengesYears of assumptions built into mPulse from Snowflake dependencySnowflake m
5、ade it easy to“throw$at the problem”by just up-sizing warehouses so we never focused on optimizationNeeded a comprehensive query inventory,and discussions and plans for how to transition each workloadOther internal teams depend on mPulse data,and they need their own migration paths and hand-holdingN
6、ew tooling needsOrganizationally,two sibling teams(mPulse,Asgard)needed to figure out how to work together and support each other8Asgard:Enterprise-ScaleApache Spark9What is Asgard?An homegrown cloud based Data WarehouseSnowflake like deployment model(S/M/L/XL WH)Snowflake like ingest API(COPY INTO)