1、Is Your GPU Really Working Efficiently in the Data Center?N Ways to Improve GPU UsageXiao Zhang,DaoCloud&Wu Ying Jun,China Mobile CloudAbout usXiao Zhang software engineerGithub wawa0210 DaoCloudWu Ying Jun Github wuyingjun-lucky China Mobile CloudChallenge:availability,cost,infrastructure,utilizati
2、onCopyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.Challenge:low utilizationCopyright 2024 by ClearML.All rights reserved.|All trademarks are the properties of their respective owners.Nearly 75%of users have a GPU utilization rate of no more
3、 than 70%.How to maximize resource utilization using orchestration or other tools becomes a considerationtrain and inferenceLLMsissuesImprove LLMS training scale,efficiencyIncrease stable training periodIssuesDemandsIncreasing parameter scale and sample data Rapidly growing computing power demand In
4、sufficient training scale,inefficient,unstable Cloud-Native becomes the solution for LLMstrainingChallenge 1How to use cloud-native technology to improve training scale and efficiencyScaleModel Parallelism(Tensor+Pipeline)Data ParallelismAddress the issue of excessive sample dataResolve the problem
5、of excessive parameter scaleOrchestration GPU0GPU1GPU2GPU3TPGPU0GPU1GPU2GPU3TPGPU0GPU1GPU2GPU3TPGPU0GPU1GPU2GPU3TPDPPPPPbatchmini-batchmini-batchleafleafspineKEYSOrchestrationNetwork topology:Dual(Triple)Layer Parameter TOR SwitchCommunication overhead:Tensor Parallelism Data Parallelism Pipeline Pa
6、rallelismTensor Parallelism Data Parallelism Pipeline ParallelismOrchestration In Kubernetes ClusterNodeNodek8s node x Naiops.io/switch:leaf1.Training Jobtensor parallelism&data parallelism training pods.Spine-LeafNodeNodek8s node x Naiops.io/switch:leaf2tensor parallelism&data parallelism training