2020年终大会-机器学习平台：13-3.pdf

上传人： li

编号：29907

2020-12-01

PDF 35页 1.69MB

《2020年终大会-机器学习平台：13-3.pdf》由会员分享，可在线阅读，更多相关《2020年终大会-机器学习平台：13-3.pdf（35页珍藏版）》请在三个皮匠报告上搜索。

1、Data Provider Solution for DLT on Brain+ 旷视科技杨阳背景瓶颈分析解决方案未来和展望分享大纲背景 Deep Learning training (DLT): an important workload on clusters 应用广泛： Image Classification Object Detection Natural Language Processing Recommender Systems 视觉领域数据密集 ImageNet-1K: 1.28 million images Open Image: 9 million image

2、s 成本昂贵, i.e., GPUs Training the well-known ResNet-50 model on the ImageNet-1K dataset takes more than 30 hours in a cluster Brain+: DLT 的生产力平台抽象基础设施 CPU/GPU/Memory/Storage 研究员友好而不是工程师友好工程化 DLT 流程上尽可能节省研究员时间提供易用而高效的定制基础设施核心目标：解放研究员的生产力问题聚焦数据供给问题：又快又好的让模型可以吃到数据问题特征：大规模数据集复杂的 CPU 上数据增广策略数据复用

3、明显瓶颈分析 ExampleExample WorkloadWorkload Resnet50Resnet50 is a popular vision model Process 10,500 images/sec on 8 Nvidia V100s Goal: Keep GPUs busy and utilize them efficientlyGoal: Keep GPUs busy and utilize them efficiently Remote Remote store store with with several several TBs of TBs of training training datadata 2GB/s ExampleExample WorkloadWorkload Resnet50Resnet50 is a popular vision model

2020年终大会-机器学习平台：13-3.pdf

相关报告