当前位置：首页 > 报告详情

通过在 IREE 中启用 RISC-V 微内核支持来加速 GenAI 工作负载.pdf

上传人： c** 编号：955327 2025-10-27 PDF PDF 17页 1.73MB

该报告所属合集： RISC-V欧洲峰会2025（RISC-V Summit Europe 2025）嘉宾演讲PPT合集

打包下载报告合集

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载报告到电脑，查找使用更方便

VIP专享文档

书签

分享

收藏

已收藏

版权投诉

/17

立即下载

《通过在 IREE 中启用 RISC-V 微内核支持来加速 GenAI 工作负载.pdf》由会员分享，可在线阅读，更多相关《通过在 IREE 中启用 RISC-V 微内核支持来加速 GenAI 工作负载.pdf（17页珍藏版）》请在三个皮匠报告上搜索。

1、Accelerating GenAI Workloads by Enabling RISC-V Microkernel Support in IREEAdeel Ahmad,Ahmad Tameem,Nouman Amir,Bilal Zafar,Saad Bin Nasir10 xEngineersOutlineGenerative AI workloadsIREE compilation with custom microkernels(ukernels)Custom RISC-V matrix multiplication ukernels-implementationKernel-an

2、d model-level resultsSummary2Generative AI WorkloadsConversational LLMsGenerative AI workloads are dominated by transformer-based auto-regressive large language models(LLMs)text/image/code generation,chatbots,content writing,video generation and other common uses-cases heavily employ LLMsMatrix-matr

3、ix and matrix-vector multiplications dominate these workloadsSource:Chatgpt3IREE Compilation with Custom KernelsOpen-source direct code generation MLIR-based compiler and runtimeHost/device programming model with multiple target architectures through a hardware abstraction layer(HAL)stack is mostly

4、architecture agnostic step towards heterogeneous compilationHost does scheduling,vm-bytecode for runtime portabilityDevice-side codegen;Upstream IREE has RVV codegen through LLVMMicrokernelsIntended to prevent the dichotomy between compiler and kernelsperform arithmetic but no memory allocationstand

5、alone development and unit testing in C leads to quicker development4Matrix Multiplication ukernel(mmt4d)Compilation in IREEFor x86_64 and ARM64 architectures,IREE leverages linalg dialects mmt4d op for matrix multiplicationmmt4d op is meticulously optimized to exploit hardware-specific vector instr

6、uctions and cache hierarchiesMaterializeHostEncodingPassCPULowerToUKernelsPassLowerUKernelOpsToCallsPass+Only relevant parts of MLIR and pass pipeline are shownmatmul pack+mmt4d+unpackmmt4d iree_uk_mmt4d ukernel call ConvertToLLVMPassmatmul.mlirPrecompiled ukernel bitcodeukernel_bitcode_*.bcStatic l

word格式文档无特别注明外均可编辑修改，预览文件经过压缩，下载原文更清晰！

三个皮匠报告文库所有资源均是客户上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作商用。

根据报告的内容，全文主要内容概括如下： - **生成式AI工作负载**：以Transformer为基础的大型语言模型（LLMs）在生成式AI工作负载中占主导地位，如文本、图像、代码生成、聊天机器人等。 - **IREE编译与自定义微内核**：IREE是一个基于MLIR的编译器和运行时，支持多种目标架构，通过硬件抽象层实现架构无关性。微内核用于执行算术运算，避免编译器和内核之间的二分法。 - **RISC-V矩阵乘法微内核**：实现了针对F16xF16到F32的RISC-V ukernels，优化了矩阵乘法性能。 - **性能提升**：在LLM的预填充和解码阶段，自定义矩阵乘法微内核实现了约2倍和50倍的单线程运行时性能提升。 - **基准测试**：在MILK-V Jupiter板上进行的基准测试显示，预填充阶段的pack操作占用了60%以上的计算时间，而编译时的const-eval优化可以消除这一成本。 - **总结**：微内核的引入显著提升了RISC-V架构在生成式AI工作负载中的性能，未来将推动更多开源贡献和协作，以优化RISC-V基于的ML kernels。

"RISC-V加速AI，性能翻倍？" "IREE矩阵乘，微内核助力！" "LLM微内核，编译时优化大揭秘！"

全行业研究报告分享下载平台

0731-84720580
商务合作：really158d
友链申请 (QQ)：1737380874

关于我们

更多

关于我们

三个皮匠报告微信公众号

三个皮匠报告微信小程序

扫码咨询网站充值下载问题

友情链接：

营销自动化亿欧智库微播易阿里妈妈

copyright@2008-2013 长沙景略智创信息技术有限公司版权所有网站备案/许可证号：湘B2-20190120 | 工信部备案号：湘ICP备17000430号-2 | 公安备案号：湘公网安备43010402001071号

客服

小程序

服务号

折叠