《将 HugeCTR Embedding 集成于 TensorFlow.pdf》由会员分享,可在线阅读,更多相关《将 HugeCTR Embedding 集成于 TensorFlow.pdf(23页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIAINTEGRATEHUGECTRWITHEMBEDDINGTENSORFLOW董建兵(JianbiingDong),Dec.17h2020#page#HugeCTR: Scalable, Accelerated Traininghttps:/ is a highly efficient GPU framework and reference design dedicated for Click-Through-Rate (CTR) estimatingtraining.Fast: Fastest available solution in MLPerf vO.7.Achieves
2、a speedup of up to 114X over TensorFlow on a 40-core GPUnode,and up to8.3X that of TensorFlowwitha single V100 GPU.Dedicated:Training with Terabyte models on single/ multi-nodesGPU hash table and dynamic insertion on stream trainingSupporting variants recommendation models: WDL / DCN / DeepFM / DLRM
3、 etcEasy to use and Flexible: Python/C+ interface, JSON-based Network Configuration#page#Framework for Recommendation SystemEmbedding for stream training High performance GPU hashtable based on cudfSupporting dynamic insertionResolving colision in flyUnified EmbeddingEmbeddingALembeddings (multi fea
4、ture fields) in oneHashtableHashtableHashtableHashtableFused Computation / Transaction / UpdateSorting based parameter update to reduce memory footprintSparse InputsNatively multi-hot supportDistriibute Embeddings to multi GPU#page#AGENDAIntroductionWhat is special of HugeCTR EmbeddingUsage Guide 8
5、SamplesHow to define DNN models with the pluginPerformancePerformance comparison#page#INTRODUCTION#page#HugeCTR Embeddingmax_nnzbatch_size * slot numInputHash TableEmbedding TableOutputDataUnifies multi slots (feature fields)into one embedding table.GPU hash table to support dynamic insertion in str
6、eam trraining#page#HugeCTR EmbeddingCSR 1Hash Table 1Embedding Table 1Output 1Reduce ScatterCSR2Hash Table 2Embedding Table 1Output 2InputDataCSRnOutput nHash Table nEmbedding Table nHash table and embedding table are both split.when looking up embedding vectors, each GPU works independently.After t