《Optics in AI Clusters - Meta Perspective.pdf》由会员分享,可在线阅读,更多相关《Optics in AI Clusters - Meta Perspective.pdf(10页珍藏版)》请在三个皮匠报告上搜索。
1、In this talk we describe the AI workload requirements driving larger GPU clusters and connect those to the IO demands for future accelerator packages.Effectively escaping IO from future accelerator packages is a critical technology challenge.The scaling of electrical signaling solutions is becoming
2、more challenging and integrated optics solutions with their high bandwidth and high bandwidth density show promise to address these package and rack scale challenges.Optics in AI Clusters Meta Platforms PerspectiveDrew Alduino,Optical Engineer AI/ML Systems,Meta PlatformsOptics in AI ClustersMeta Pl
3、atforms PerspectiveSPECIAL FOCUS:OPTICSOPTICSLLM model sizes continue to increaseIO and memory hardware growth continues to lagHW FLOPs is increasing faster than IO hardware,but still lags model size growthAI Hardware and Model TrendsModel Size:x2/4 monthsComponent Capacity:x2/24 monthsFrom:https:/
4、LLM models improve(L)with larger parameters(N)and data training set sizes(D)Training cluster FLOPs increase with parameter and training set size increases(constant training time)FLOPs growth can come from FLOPs/GPU(accelerator)AND#of GPUs in AI clusterAs#of GPUs in AI cluster increases,the network r
5、equirements increaseAI Cluster Growth MotivationC=C0ND L=ANL0+BDL is the figure of merit,lower is better(the average negative log-likelihood loss per token)N=#of parameters in the modelD=#of tokens in data setC=cost of training in FLOPs(FLOP-days)Empirical constants:C0(6),A,B,L0 From wikipedia https
6、:/en.wikipedia.org/wiki/Large_language_model#cite_note-kaplan-scaling-43 Test Loss(L)Public reference here:https:/arxiv.org/pdf/2001.08361.pdf Difficult to serve all classes of models with a single system design point New models¶llelism techniques put unexpected pressures on AI systems The next