《高性能网络加速智能推荐系统.pdf》由会员分享,可在线阅读,更多相关《高性能网络加速智能推荐系统.pdf(29页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIAHIGH PERFORMANCE E2EETHERNET SOLUTIONACCELERATERECOMMENDERSYSTEMGTC China,Oct 2020#page#Recommendation PipelinesExampleExperimentationDATALAKETrain dataFeature engineringData Pre-processingTBstPBsModel(s)trainingTrain dataGBstOTBProduction InferenceProduction Re-training0(10)Feature engineerin
2、gRecommender5ystemImprowedaccuracy?DataPreprocessingCanddate generationModel(s)trainingweekly/0oil2电座D#page#Recommendation PipelinesChallengesData (ETL)TrainingInferenceFeatureThroughput&HugeembeddingPerformance &Data loadingtablesexplorationAccuracyLatencyHuge data sets:Data loading canLarge embedd
3、ingHard to achievDifficult to havebe50%oftotaltablesexceedTBs,PBsormorehigh scalinghigh throughputefficiencywithtraining time.single GPUand low latencyComplex databoth model andmemorywhen ranking preprocessing andTabular datadata parallelism.huge number ofloading scalesSub-optimalfeatureitems.Longer
4、 iterationengineeringpoorlywitharlookupsopscycles reducethepipelines.item-by-itemimplementation.abilitytoreachapproach.Many iterationshigheraccuraciesrequired.quickly#page#Nvidia Ethernet Switch addressthe challengesSpeed, Feed and Latency-Fast interconnectFast access datasetRDMA and RoCELow latency
5、 access GPU memoryLoW latency access external datasetMonitoring and Management#page#SPEED AND FEED-THE NEED OF BANDWIDTHIntra-layer model parallelData parallelIntra-layer model parallel leaves collectives exposedCommunication speedup mustAccelerating math without accelerationmatch math speedup,other
6、wisecommunication suffers from basic Amadahls lawproblemwe achieve little E2E speedupTypically collectives span NVLink domain onlyAlreduce spans both NVLink and networking domains:bandwidth must be available in each#page#NVIDIAS MULTI-GPU,MULTI-NODE NETWORKING AND STORAGE IOOPTIMIZATION STACKBuild l