6567 - LLM Inference Performance Projection.pdf

编号:651493 PDF 17页 1.10MB 下载积分:VIP专享
下载报告请您先登录!

6567 - LLM Inference Performance Projection.pdf

1、Mohan J Kumar,Perpetual Intelligence Edmund Song,Intel LLM Inference Performance Projection AI Market Trends AI Refresher Training vs Inference Types of Parallelism MESA LLM Inference Performance Projection Overview and Examples Unique Attributes Summary and Call to ActionAgenda Generative AI revenu

2、e expected to be$1T by 20321Generative AI will be 12%of technology spend by 20321Global AI Market by 2030 will be$1.7T2 Global AI Inference Market expected to grow at 18.4%CAGR to$133B by 20343Inference is excepted to 90%of AI computing by 2030 AI Trends Sources:1 Bloomberg 2 Grand View Research 3 M

3、arket.us Training vs.Inference Large Dataset Car?Error Forward pass Backward pass Model trains on dataset Adjusts parameters to minimize error CarForward pass TrainingInference With new data input,output a prediction Types of Parallelism Multiple full copies of models on different GPU or AI clusters

4、 Increases overall throughput processing multiple request in parallelGPU0 GPU3 GPU2GPU1 Long Sequence is split across multiple GPUs or AI clusters More commonly used in inference Allows handling long sequences without running out of memory Sequence Parallelism Data Parallelism GPU0 GPU3 GPU2GPU1 Seq

5、1Seq2Seq 3Seq 4Input Sequence Types of Parallelism Model split across multiple GPUs or AI clusters E.g.split across 4 GPUs in above illustration Support large models that do not fit within a single unit(GPU or AI cluster)memory constraints Tensor Parallelism Model layers split across multiple GPUs o

6、r AI clusters E.g.4 layers split across 4 GPUs in above illustration Better utilization of AI hardware Pipeline Parallelism GPU0 GPU1GPU3GPU2Types of Parallelism Allows spreading experts across multiple GPUs or AI accelerators or AI clusters Activate only a subset of experts per input avoiding redun

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(6567 - LLM Inference Performance Projection.pdf)为本站 (芦苇) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠