《keynote-supporting-large-scale-and-reliability-testing-in-kubernetes-using-kwok-mo-3hxi-nanokuberneteszhi-kwokmao-chan-mao-reliao-mao-yuan-chen-nvidia-shiming-zhang-daocloud.pdf》由会员分享,可在线阅读,更多相关《keynote-supporting-large-scale-and-reliability-testing-in-kubernetes-using-kwok-mo-3hxi-nanokuberneteszhi-kwokmao-chan-mao-reliao-mao-yuan-chen-nvidia-shiming-zhang-daocloud.pdf(27页珍藏版)》请在三个皮匠报告上搜索。
1、Large Scale and Reliability Testing in Kubernetes using KWOKShiming Zhang,DaoCloudYuan Chen,NVIDIAOutlineKWOK overview and demoFault injection for reliability testing and demoSummary KWOK OverviewControl PlaneKubernetes ClusterNode 1kubeletCloudProviderAPINode 2kubeletc-metcdapic-c-mschedmetrics-ser
2、verEnvironmentControl PlaneKWOK:Kubernetes WithOut Kubeletc-mschedFake NodeFake NodeKWOK Controllersimulate node/kubelet and other k8s resourcesetcdapi_binaryminikubekwokctlcommand line toolOSmetrics-serverControl PlaneKWOK Controllerc-mschedFake NodeFake NodeKWOK ControllerSimulate and manage lifec
3、ycle of nodes,pods,and other objectsSimulate Kubelet and Node APIsetcdapimetrics-serverkwokctlControl PlaneRuntime_binarykwokctletcdctlA command line tool for cluster creation and management OSetcdapic-mschedkubectlHost(Workstation)metrics-serverKWOK:Simulate Node UtilizationFake NodehpaFake Nodevpa
4、ClusterAutoscalerSimulates metrics and node load metrics-serverapikubectl topKWOK:Create Large Scale Clusters1K Nodes10K PodsKWOK:Use Low Resource KWOK Summary kwok controller:core componentSimulate lifecycle of nodes,pods,and other Kubernetes objectsSimulate nodes and Kubelet APIs Simulate node uti
5、lization via Kubelet metricsKwokctl:a series of command line toolsoCreate and manage kwok clustersoDump/restore cluster snapshotKWOK is a toolkit for creating and managing large scale Kubernetes clusters with fake nodes using minimum resourcesFailure Injection and Reliability Testing Large Scale Kub
6、ernetes GPU ClustersNVLINK+GPUDirect RDMANUMA binding Multi-level EW switching fabricRack+spineSwitch hierarchyNetwork topologyHardware Architecture and TopologySource:Accelerating AI Workloads with GPUs in Kubernetes-Kevin Klues,Distinguished Engineer&Sanjay Chatterjee,Engineering Manager,NVIDIA,Ke