《observability-supercharger-build-the-traffic-topology-map-for-millions-of-containers-with-zero-code-qi-yun-wa-jdaepzhi-xi-xu-kou-jhao-wo-sheng-wei-teck-chuan-lim-shopee.pdf》由会员分享,可在线阅读,更多相关《observability-supercharger-build-the-traffic-topology-map-for-millions-of-containers-with-zero-code-qi-yun-wa-jdaepzhi-xi-xu-kou-jhao-wo-sheng-wei-teck-chuan-lim-shopee.pdf(27页珍藏版)》请在三个皮匠报告上搜索。
1、Observability SuperchargerBuild the Traffic Topology Map for Millions of Containers with Zero CodeLim Teck ChuanChow Sheng WeiEngineering Infra,ShopeeWhat is a topology map?ServiceKafkaDatabaseWhy is it helpful?WorkloadsContainerCacheDatabaseQueueObservabilityMetricsLogsTracesBusiness ObjectivesDepe
2、ndency GraphDatacenter MigrationResource AccountabilityPlatformFinOpsAIOpsDataOpsDevOpsDependency graphWhat?Stateless/stateful service taggingWhy?Different workload types require different operational proceduresDependency graphWhat?Service to middleware/storage relationshipsService to service call g
3、raphsWhy?Incident responseResource accountabilityWhat?Resource to service relationshipsWhy?Cost attribution and budgetingService migrations makes accounting difficult What does our container ecosystem look likeTraffic enters our L4/L7 load balancersFront services translate HTTP to RPCPods run a vari
4、ety of workloadsAPI servicesQueue consumersCronjobsPhysical serversAcross 10 AZsL4/L7 LBsFront ServicesFront ServicesBiz line ABiz line BBiz line CPhysical serverPhysical serverVarious ways we triedPlatform workflowClient side instrumentationDomain sniffingPlatform workflowHow?For all new clusters,r
5、elationship binding to a service is mandatoryFor all existing clusters,do a one time data collection from service ownersWhat?Lots of legacy clusters had no bindingsExisting bindings had no guarantee of being correctLessons learntData collection takes way too long and costs a lot in terms of human la
6、borClient side instrumentationHow?Add instrumentation in client libraries(kafka,redis,etc.)What?Long rolloutRequired code changesNot all services use the internal client librariesLessons LearntCode changes takes a long time for rolloutDomain SniffingHow?eBPF agent on all machines that intercepts con