《024-洪培翔.pdf》由会员分享,可在线阅读,更多相关《024-洪培翔.pdf(13页珍藏版)》请在三个皮匠报告上搜索。
1、AutoIREE:Automatic Performance Tuning for AI models on RISC-V Vector Architectures洪培翔,張元銘,吳奕緯Andes Technology CorporationRISC-V Summit China,2024/08/222Subject to change without noticecopyright 2021-2024 Andes TechnologyBackground IntroductionIREE Andes Vector Processor FamiliesImplementation and In
2、novationExperiment ResultsFuture WorksOutline3Subject to change without noticecopyright 2021-2024 Andes TechnologyIREE,MLIR-based Compiler+RuntimeModel inputTargetsPrebuilt AI Model(Pytorch,TF,TFL)Andes CoresGPUsCourtesy of IREE website4Subject to change without noticecopyright 2021-2024 Andes Techn
3、ologyAndes Vector Processor Familiesmore featuresIntegrated Matrix Ext.(IME)8-core cluster16-core cluster with private L1/L2HVM(High-speed Vector Memory)InterfaceACE for RVVACE(Andes Automated Custom Extension)int464,fp1664;bf16(conv.)+bf16(full arithmetic)+fp8VLEN:128,256,512VLEN:128,256,512,1024+2
4、048RVV 0.8RVV 1.05-stage single-issue8-stage dual-issue with shared cache for multicore*Future products subject to changeAX47MPV*AX46MPV*NX27VAX45MPVAX25-V100AX25-V100 is adopted in Metas training and Inference Accelerator(MTIA)v1.5Subject to change without noticecopyright 2021-2024 Andes Technology
5、AX45MPV VPU FeaturesVALUVMACV0.V1.V31.scoreboardVLEN.VIQVectorL/S ControlForwarding/ChainingRVVLd/StDual Issue/Dispatch,Out-of-order executionMultiple Vector Functional Units(VFUs)operating independently and simultaneouslyUp to 5 DLEN results are generated per cycleSupport precise exception(optional
6、)ACE-RVV VLEN/DLEN/BIU CombinationsVLENSIMD(DLEN)BIU(AXI)10241024512/256/1281024512512/256/128512512512/256/128512256256/128256256256/1282561281281281281286Subject to change without noticecopyright 2021-2024 Andes TechnologyImplementationTunerTransform library controls:Size of tiling levelSize of Ca