《26-d3s4-4-SiFive_Accelerating the migration from ARM NEON to RISC-V Vectors_Han-Kuan Chen.pdf》由会员分享,可在线阅读,更多相关《26-d3s4-4-SiFive_Accelerating the migration from ARM NEON to RISC-V Vectors_Han-Kuan Chen.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、2023 SiFiveAccelerating the migration from ARM NEON to RISC-V VectorsHan-Kuan ChenSenior Engineer,SiFive 2023 SiFive2OutlineWhat is intrinsics?How do software support various intrinsics?SiFive RecodeImprove SiFive RecodeArm Compute Library benchmarkOpenCV benchmarkAcknowledgmentsSpecial thanks to Cr
2、aig Topper,Kito Cheng,Peter Liao and Yi-Hsiu Hsu,who provided mentorship and guidance.2023 SiFive4What is intrinsics?Intrinsics are low-level functions provided by compiler that allow direct access to specific CPU instructions.Directly using intrinsics leverages hardware capabilities,which improves
3、execution speed of performance-critical software tasks.Most major vendors(Intel,AMD,ARM,etc.)offer intrinsics.x86:SSE&AVXarm:NEONRISC-V:RVVIntrinsics are widely used in software.e.g.,TensorFlow,Arm Compute Library,OpenCV,libyuv 2023 SiFive5How do software support various intrinsics?Due to the presen
4、ce of various intrinsics,some projects have been proposed to minimize the effort required for porting.Provide an universal interface and translate it to different targets.e.g.,xnnpack and highwayTransfer intrinsics internally into another different intrinsics.e.g.,simde,AvxToNeon,neon2sse and sse2ne
5、onRISC-V is new,how do we support various software and intrinsics?2023 SiFive6SiFive RecodeProtect your existing software investment,migrate with confidence.#include float32_t dot_prod(const float32_t*in1,const float32_t*in2,uint32_t blockSize)float32x4_t acc=vdupq_n_f32(0.0f);for(uint32_t i=0;i!=bl
6、ockSize;i+=4)float32x4_t A=vld1q_f32(in1+i);float32x4_t B=vld1q_f32(in2+i);acc=vmlaq_f32(acc,A,B);return vaddvq_f32(acc);dot_prod:beqza2,.LBB0_3vsetivlizero,4,e32,mf2,ta,mavmv.v.i v8,0.LBB0_2:vle32.v v9,(a0)addia0,a0,16vle32.v v10,(a1)addia1,a1,16vfmacc.vvv8,v9,v10addiwa2,a2,-4bneza2,.LBB0_2j .LBB0_