optimize-llm-workflows-with-smart-infrastructure-enhanced-by-volcano-chuan-hui-volcanozha-xia-27dya-shi-llmxiao-xin-li-qihoo360-xuzheng-chang-huawei-cloud-technologies-co-ltd-1.pdf

编号:627277 PDF 21页 5.24MB 下载积分:VIP专享
下载报告请您先登录!

optimize-llm-workflows-with-smart-infrastructure-enhanced-by-volcano-chuan-hui-volcanozha-xia-27dya-shi-llmxiao-xin-li-qihoo360-xuzheng-chang-huawei-cloud-technologies-co-ltd-1.pdf

1、Optimize LLM Workflows with Smart Infrastructure Enhanced by VolcanoXin Li,Qihoo360 Xuzheng Chang,Huawei Cloud Technologies Co.,LTDCatalog1.Background2.Status3.Issues4.SolutionsBackgroundLLM Keyword TrendsStarting from 2023,LLM has received more and more attentionMore and more LLM infrastructures us

2、ing KubernetesKubernetes support for LLM is getting better and betterOpenAI Blog Post20182021Google Search ResultsStatusx3000 x6000/Mx1000TrainingBig dataTextVideoCPUMemoryNVIDIAAscendOthers3000+users from different departments,6000+tasks per month10+clusters,1000+nodesComplexity of task types.Train

3、ing,reasoning,development.Resources:1-200 instances per task,single instance CPU:1c-200c,GPU:1-8,memory 20G-2TFunction:ssh password-free,pod-to-pod communicationOperation:all instances are scheduled simultaneouslyComplexity of running time.Hours,days,months and days coexist.Complexity of computing r

4、esources.CPU,GPU,NPU,etc.Complexity of network environment.Ethernet,IB,RoCEDevelopmentInferenceIssueFailureEfficiencyUsabilityFailureGPU lostECC errorGPU failureNIC failureData center power outageMisoperationNAS failureCluster failureNVLINK failureP2P failureCooling failure.The Llama 3 Herd of Model

5、shttps:/ strategy optimizationMultiple mission typesVarious hardwareMassive data transferEnvironment dependencyEnvironment preservationMultiple IDE integrationsTensorboard,GrafanaObservability optimizationMulti-department resource allocationExclusive resources/public resourcesTask preemptionTask que

6、uingGang scheduling strategyBinpack scheduling strategyMegatron-LMDeepSpeedopensoraDistributed training tasksLLM tasksMultimodal tasksData processingSingle machine single card,single machine multiple cards,multiple machines multiple cards tasksNVIDIAAscendPure CPU tasksRoCE/IBGPU slicingSolutionsVol

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(optimize-llm-workflows-with-smart-infrastructure-enhanced-by-volcano-chuan-hui-volcanozha-xia-27dya-shi-llmxiao-xin-li-qihoo360-xuzheng-chang-huawei-cloud-technologies-co-ltd-1.pdf)为本站 (山海) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠