《高效边缘 AI 的先进模型量化和优化.pdf》由会员分享,可在线阅读,更多相关《高效边缘 AI 的先进模型量化和优化.pdf(29页珍藏版)》请在三个皮匠报告上搜索。
1、State-of-the-Art Model Quantization and Optimization for Efficient Edge AIHyunjin KimCompiler Team LeadDEEPX AIDEEPX:Edge AI Solution2 2023 DeepX AINPUTechnologyNPU based SoCTechnologyAI HW/SWOptimizationDeveloping one of the most efficient NPU technologiesDeveloping SoC ASIC with NPU for commercial
2、 productsOptimizing both AI HW&SW to provide the highest NPU efficiencyDEEPX:Edge AI Solution3 2023 DeepX AINPUTechnologyNPU based SoCTechnologyAI HW/SWOptimizationDeveloping one of the most efficient NPU technologiesDeveloping SoC ASIC with NPU for commercial productsOptimizing both AI HW&SW to pro
3、vide the highest NPU efficiencyThis Talk!Satisfy the EDGE AI inference requirement EDGE AI inference requirement Maintain high accuracy with quantization Support various AI models(operation coverage)Minimize HW Cost:SRAMDEEPX Mission4 2023 DeepX AI Satisfy the EDGE AI inference requirement EDGE AI i
4、nference requirement Maintain high accuracy with quantization Support various AI models(Operation Coverage)Minimize HW Cost:SRAMDEEPX Mission5 2023 DeepX AIOur Solution:HW/SW Co-design HW/SW Co-Design Selective quantization to avoid accuracy drop 8-bit quantized ops+FP32 opsHigh Accuracy with Quanti
5、zation6 2023 DeepX AIDXNN-LiteDXNN-ProDXNN-MasterNorm.AccuracyOver FP32(avg.)99.01%99.42%100.54%Time Cost 1 min-12 hours50+GPU Hours*Training env.High operation coverage with minimum implementation cost HW/SW Co-Design Increase op coverage via combinations of supported ops TransConv:Resize+Padding+C
6、onv DilatedConv:multiple Convs ArgMax:Two-stage implementationOperation Coverage 7 2023 DeepX AI DEEPX NPUs have small SRAM sizes.HW/SW Co-Design DEEPX compilers memory optimization maximizes SRAM utilization to reduce DRAM accessesMinimize HW Cost:SRAM8 2023 DeepX AISRAM:1 MBSRAM:8.25 MBSRAM:2.25 M