《通过混沌工程持续提高系统韧性的落地实践 (Featuring PayerMax).pdf》由会员分享,可在线阅读,更多相关《通过混沌工程持续提高系统韧性的落地实践 (Featuring PayerMax).pdf(31页珍藏版)》请在三个皮匠报告上搜索。
1、 2025,Amazon Web Services,Inc.?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?谷雷亚马逊云科技解决方案架构师 2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?l?/?l?l?l?l?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?务?2022 2025,Amazon Web Services,Inc.或其附属公司。保留
2、所有权利。*?务?/?“Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the systems capability to withstand turbulent conditions in production.”?务?务?https:/principlesofchaos.org 2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?VALET?模?ALB 务?Redis CPU?Cassandra?30
3、0ms?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?VALET?Availability?-?Latency?-?Error?Ticket?-?Volume?-QPS?TPS?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?3000 TPS?EKS?务 40%?API?P99?100ms?EKS?2?Pod?3?30?v?v?v?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?务?2025,Amazon Web Services,Inc.或其
4、附属公司。保留所有权利。*?务?/?OpenSearch/CWL/X-rayAt a rate of 300 TPS,If 40%Amazon EKS Payments ClusterCW Alarm when node count 60%Some clients calls will time outBrownout-Terminate 40%nodesT E M P L A T EWorkload NameChaos ExperimentRealtime PaymentsActionEnvironmentStagingDuration30 minutesLoad300 TPSTargets
5、Fault isolation boundaryStop ConditionRollbackCFN template to built nodesObservability/LoggingHypothesisFindingsCOE Actions to mitigate faultResource Tag/ID/FilterChaos-readyContributionResilience 2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?GameDay?Gameday?1?务?GameDays?务?GAMEDAY?-?-?-?研?-?2025,
6、Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?Amazon Fault Injection Simulator(Amazon FIS)+Amazon SSM Chaos RunnerChaostoolkit+ChaosIQ Chaos AmazonLitmus(doc GitHub)ChaosMesh 2025,Amazon Web Services,Inc.或其附属公司。保留所有权利。*?务?/?(FIS)?(EC2)?(EC2)API?CPU?(EC2