《Bagua!并行通讯库.pdf》由会员分享,可在线阅读,更多相关《Bagua!并行通讯库.pdf(30页珍藏版)》请在三个皮匠报告上搜索。
1、A Modular & Efficient CommunicationLayer for Distributed LearningBAGUAXiangru LianRui Wang,Hongmei Shi,留快手Shengzhuo Zhang, Yan Tian, Ji LiuShaoduo Gan Jiawei Jiang,EH aaridCe Zhang名快手 EmHzuiriich#page#Deep Learning Eating the World留快手Fascinating progress over thelastdecadefueled by increasingly larg
2、e dataset andAt Kuaishoucomputation power. Recommendation systemThe availability of computation changes our(300+M daily users behaviors)viewon what we can do.EC2 Istance:g2.8xlargeMultimedia understanding2O154XGRIDK520,4.89TFLOPS$2.60/hourCdozens of thousands yearsEC2 Instance: p4d.24xlargedaily acc
3、umulated watchings32.77/htime)What used to take weeks now take hoursVideo preprocessing C200CThink about ImageNet); but users getvideos are uploaded per sec)“greedy with new tasks thattake weeks ona beefy machine.(Think about BERT)#page#Existing Systems and Our VisionAmazingAmazingQuestioni Houu to
4、accommodate theTechniquesSystemsGAP:Currentewer-growing demand of ML trainingAmazing Systemsouer euer-growing scale ofdataDont SupportDecentralizedo Could be algorithmic solutions:Recently DewelopedTrainingbetter models,algorithms,Amazing Techniquesoptimizer.AsynchronousIByteDanceo Could be sustem s
5、olutions: SCALEBytePSTrainingSCALE.SCALENVIDIABAGUAWe focus on scaling with dataCommunicationApexOURGOAL:parallelism- each worker holds aQuantizationDistributed Learningpartition of data, and they jointly trainMicrosoftwith SOTAa single ML model.DeepSpeedCommunicationCommunicationSparsificationOptim
6、izationTechniques.#page#Bagua: Modular Communication (“Gossip”Primitive Communication PattensPrimitive Logical Channels品品品品品品配H配品CentralizedAsynchronousDecentralizedLosslessSparsificationQuantizationModularCompositionDeep LearningOptimization AlgorithmTraining StackSGDADAM品印ReadilyDeep Learning Libr