26-d3s4-4-SiFive_Accelerating the migration from ARM NEON to RISC-V Vectors_Han-Kuan Chen.pdf

编号:155398 PDF 14页 1.44MB 下载积分:VIP专享
下载报告请您先登录!

26-d3s4-4-SiFive_Accelerating the migration from ARM NEON to RISC-V Vectors_Han-Kuan Chen.pdf

1、2023 SiFiveAccelerating the migration from ARM NEON to RISC-V VectorsHan-Kuan ChenSenior Engineer,SiFive 2023 SiFive2OutlineWhat is intrinsics?How do software support various intrinsics?SiFive RecodeImprove SiFive RecodeArm Compute Library benchmarkOpenCV benchmarkAcknowledgmentsSpecial thanks to Cr

2、aig Topper,Kito Cheng,Peter Liao and Yi-Hsiu Hsu,who provided mentorship and guidance.2023 SiFive4What is intrinsics?Intrinsics are low-level functions provided by compiler that allow direct access to specific CPU instructions.Directly using intrinsics leverages hardware capabilities,which improves

3、execution speed of performance-critical software tasks.Most major vendors(Intel,AMD,ARM,etc.)offer intrinsics.x86:SSE&AVXarm:NEONRISC-V:RVVIntrinsics are widely used in software.e.g.,TensorFlow,Arm Compute Library,OpenCV,libyuv 2023 SiFive5How do software support various intrinsics?Due to the presen

4、ce of various intrinsics,some projects have been proposed to minimize the effort required for porting.Provide an universal interface and translate it to different targets.e.g.,xnnpack and highwayTransfer intrinsics internally into another different intrinsics.e.g.,simde,AvxToNeon,neon2sse and sse2ne

5、onRISC-V is new,how do we support various software and intrinsics?2023 SiFive6SiFive RecodeProtect your existing software investment,migrate with confidence.#include float32_t dot_prod(const float32_t*in1,const float32_t*in2,uint32_t blockSize)float32x4_t acc=vdupq_n_f32(0.0f);for(uint32_t i=0;i!=bl

6、ockSize;i+=4)float32x4_t A=vld1q_f32(in1+i);float32x4_t B=vld1q_f32(in2+i);acc=vmlaq_f32(acc,A,B);return vaddvq_f32(acc);dot_prod:beqza2,.LBB0_3vsetivlizero,4,e32,mf2,ta,mavmv.v.i v8,0.LBB0_2:vle32.v v9,(a0)addia0,a0,16vle32.v v10,(a1)addia1,a1,16vfmacc.vvv8,v9,v10addiwa2,a2,-4bneza2,.LBB0_2j .LBB0_

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(26-d3s4-4-SiFive_Accelerating the migration from ARM NEON to RISC-V Vectors_Han-Kuan Chen.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠