《2019年超大规模高可用性云端系统构建之禅.pdf》由会员分享,可在线阅读,更多相关《2019年超大规模高可用性云端系统构建之禅.pdf(37页珍藏版)》请在三个皮匠报告上搜索。
1、Survive in Cloud The Zen of High Availability at Massive Scale in CloudMobvistaNo.1950M320M200+TOP 10Mintegral SDK DAUChinaCountries/Regionsworld-wideDMPs DAU60B Daily Ads requestAll in CloudPublisherRDSOffer managementOnline DMPKinesisEMRRedshift*Big Data&MLS3CloudWatchESMetrics&AlarmSDKAPIManualKi
2、nesisS3Lambda functionDynamoDBTracking ServiceinstancesSpot FleetAuto ScalingElastiCacheSQSVolume Processing ServiceinstancesSpot FleetAuto ScalingRTBAdvertiserCloud ComputingQuick ScalingLow CostHigh ReliableOn-DemandRapid elasticityPay per useUncertain downtimeCloud CharacteristicsService GoalsHig
3、h AvailableFault OrientedOnce you accept that failures will happen,you have the ability to design your systems reaction to specific failures.Isolated DesignMicro Kernelplug-inplug-inplug-inplug-inplug-inplug-inplug-inplug-inplug-inplug-inplug-inplug-inExtension PointExtension PointExtension PointExt
4、ension PointExtension PointExtension PointIsolated DeploymentOrdering ServiceCart ServiceCheckout ServicePayment ServiceFulfillment ServiceReused vs.IsolatedReused logic structure vs.Isolated physical structureCritical Data CollectorLog Data CollectorData Transform ServiceData Transform ServiceData
5、Transform ServiceCritical Data CollectorLog Data CollectorRedundancyRedundancyOnline ServiceStandby ServiceLoad BalancerLoad BalancerOnline RedundancyCommon Failure ModesPropagated FailureLoad BalancerQPS 1500Max QPS 1000Rate LimitCascading FailureServiceDServiceEServiceBServiceServiceAServiceCClien
6、tCircuit BreakerCircuit BreakerServiceDServiceEServiceBServiceServiceAServiceCClientFallbackSlow ResponseA quick rejection is better than a slow response.Pooled resources are exhausted!No Unlimited WaitingAny blocking operation needs a time limit!Recovery Oriented“A priori predic