1、Rethinking Large Language Models Efficiency and PerformanceShangHaiShangHaiOutline1.Trends of Large Models2.History and problem overview3.Parallelism Strategies&Tricks4.Kernels and Compiler5.FutureOutline1.Trends of Large Models2.History and problem overview3.Parallelism Strategies&Tricks4.New Appli
2、cationsTrends of Large ModelsShangHaiPowering computing industry for decadesDennards Scalingif the transistor density doubles,power consumption(with twice the number of transistors)stays the sameHuangs law states that the performance of GPUs will more than double every two years.Between 2006 and 202
3、1,GPU price performance(in terms of FLOPS/$)has tended to double approximately every 2.5 yearsML Models demands biuSevilla,Jaime,et al.Compute trends across three eras of machine learning.2022 International Joint Conference on Neural Networks(IJCNN).IEEE,2022.Larger model,Larger Corpus,better accura
4、cycorpus increasecorpus increase#parameter#parameter increaseincreaseKaplan,Jared,et al.Scaling laws for neural language models.arXiv preprint arXiv:2001.08361(2020).Emergence of LLMshttps:/ and problem overviewShangHaiBrief Histroy Of Large Model 20122016DistBelief;Parameter Server;Bosen;GeePS;2018
5、TF allreduce baidu;Horovod;DDP;Compute Graph and PlacementTransformer及其变种;流水线并行;大规模模型并行;2020Large language model wit FSL;PaLM:PathWayGoogle;CLIPOpenAI,连接图与文;Large Model:Sparse ModelsLarge Model:Deep ModelsLarge Model:Foundation ModelsProblem Overview:LLMs Need Huge FLOPSTransformer FLOPs Equation:C=
6、T6NDN:the number of parameters;D:the number of tokens that model is train on;LLM#Parameters(billion)#Tokens(billion)Model-FLOPSGPT31753003.15E+23LLaMa-65B6514005.46E+23LLaMa2-70B7020008.4E+23PaLM5407802.5272E+24Problem Overview:Gap between LLMs and Acceleratorhttps:/ Overview:Gap between LLMs and Ac