1-3 STRONGHOLD:快速实惠的亿级深度学习模型训练.pdf

编号:102308 PDF 26页 3.38MB 下载积分:VIP专享
下载报告请您先登录!

1-3 STRONGHOLD:快速实惠的亿级深度学习模型训练.pdf

1、STRONGHOLD:Fast and Affordable Billion-scale Deep Learning Model Training 王玮达摩院NLP算法专家2022/07/30Foundation Models ML homogenizes learning algorithms(e.g.,logistic regression),DL homogenizes model architectures(e.g.,CNN)Foundation models homogenizes the model itself(e.g.,BERT,GPT-3.)figure from On th

2、e Opportunities and Risks of Foundation Models,https:/arxiv.org/abs/2108.07258Foundation Models Training+Adaptation Pretrained on broad unannotated(multimodal)data at scale via self-supervised way Adapted to a wide rage of downstream tasks via fine-tuning.One is All figure from On the Opportunities

3、and Risks of Foundation Models,https:/arxiv.org/abs/2108.07258Model Size v.s.HW Capacity Transformer Size 2*104 x/5 year GPU Memory 6x/5 year We need more GPUs!2017201820192020202120220500100015002000BERT-baseBERT-largeGPTGPT-2GPT-3Megatron-Turing-NLGMegatron-LMT-NLGT5-baseT5-largeT5-3BT5-11BT5-XXLA

4、LBERTRoBERTa-largeZhiyuan-Wudao2.0Ali-M6KUAIMODELGShardSwitch-baseSwitch-largeSwitch-XXLSwitch-C dense sparseparameters(B)DateP100(12GB)TPU V2(16GB)V100(32GB)TPU V3(32GB)A100(40GB)A100(80GB)GPU Memory Model Size Data parallelism Distribute data across processors Processed in parallel,and parameters

5、are updated synchronously Communication happens at the all-reduce operations to sum the gradients from all processorsModel parallelism Pipeline(Inter-Layer)Model Parallelism Split sets of layers across multiple devices Layer 0,1,2 and layer 3,4,5 are on different devices Tensor(Intra-Layer)Model Par

6、allelism Split individual layers across multiple devices Both devices compute difference parts of layer 0,1,2,3,4,5 These two approaches are complementaryModel parallelism Pipeline(Inter-Layer)Model Parallelism Less communication intensive Generalizable to almost all DNNs Can req

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(1-3 STRONGHOLD:快速实惠的亿级深度学习模型训练.pdf)为本站 (云闲) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠