当前位置：首页 > 报告详情

高性能网络加速智能推荐系统.pdf

上传人： li 编号：29555 2021-02-07 PDF PDF 29页 4MB

该报告所属合集： 2020年GTC中国线上大会嘉宾演讲PPT资料合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/29

立即下载

《高性能网络加速智能推荐系统.pdf》由会员分享，可在线阅读，更多相关《高性能网络加速智能推荐系统.pdf（29页珍藏版）》请在三个皮匠报告上搜索。

1、NVIDIAHIGH PERFORMANCE E2EETHERNET SOLUTIONACCELERATERECOMMENDERSYSTEMGTC China，Oct 2020#page#Recommendation PipelinesExampleExperimentationDATALAKETrain dataFeature engineringData Pre-processingTBstPBsModel(s)trainingTrain dataGBstOTBProduction InferenceProduction Re-training0（10）Feature engineerin

2、gRecommender5ystemImprowedaccuracy？DataPreprocessingCanddate generationModel(s)trainingweekly/0oil2电座D#page#Recommendation PipelinesChallengesData (ETL）TrainingInferenceFeatureThroughput&HugeembeddingPerformance &Data loadingtablesexplorationAccuracyLatencyHuge data sets:Data loading canLarge embedd

3、ingHard to achievDifficult to havebe50%oftotaltablesexceedTBs，PBsormorehigh scalinghigh throughputefficiencywithtraining time.single GPUand low latencyComplex databoth model andmemorywhen ranking preprocessing andTabular datadata parallelism.huge number ofloading scalesSub-optimalfeatureitems.Longer

4、 iterationengineeringpoorlywitharlookupsopscycles reducethepipelines.item-by-itemimplementation.abilitytoreachapproach.Many iterationshigheraccuraciesrequired.quickly#page#Nvidia Ethernet Switch addressthe challengesSpeed, Feed and Latency-Fast interconnectFast access datasetRDMA and RoCELow latency

5、 access GPU memoryLoW latency access external datasetMonitoring and Management#page#SPEED AND FEED-THE NEED OF BANDWIDTHIntra-layer model parallelData parallelIntra-layer model parallel leaves collectives exposedCommunication speedup mustAccelerating math without accelerationmatch math speedup，other

6、wisecommunication suffers from basic Amadahls lawproblemwe achieve little E2E speedupTypically collectives span NVLink domain onlyAlreduce spans both NVLink and networking domains：bandwidth must be available in each#page#NVIDIAS MULTI-GPU，MULTI-NODE NETWORKING AND STORAGE IOOPTIMIZATION STACKBuild l

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

本文主要介绍了NVIDIA在人工智能集群网络解决方案方面的创新和进展。NVIDIA通过其以太网交换机产品，实现了高性能、低延迟的端到端以太网解决方案，优化了数据并行通信的速度和效率。文章强调了RDMA和RoCE技术在加速AI框架如Cognitive Toolkit中的重要性，以及它们如何通过直接访问GPU内存来降低通信延迟。此外，NVIDIA的解决方案支持RDMA和非RDMA混合部署，并可通过其NEO网管软件进行端到端管理。文章还提到了NVIDIA的网络产品支持RoCE over VxLAN，以及具备高级拥塞控制和流量管理功能。最后，NVIDIA的WJH™监控系统能够提供详尽的数据，帮助快速定位网络问题，优化网络性能。核心数据包括：加速推荐系统的速度，低延迟访问GPU内存和外部数据集，以及支持多达65,000个非阻塞100GbE端口的高性能网络架构。

"NVIDIA如何加速AI集群网络设计？" "ROCE技术如何提升AI框架性能？" "NVIDIA以太交换全线产品如何助力RDMA部署？"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询网站充值下载问题

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙景略智创信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备17000430号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠