《HC2022.UTokyo.Yao-Chung_Hsu.v01.pdf》由会员分享,可在线阅读,更多相关《HC2022.UTokyo.Yao-Chung_Hsu.v01.pdf(14页珍藏版)》请在三个皮匠报告上搜索。
1、(1/13)A 13.7J/prediction 88%Accuracy CIFAR-10Single-Chip Wired-logic Processor in 16-nm FPGAusing Non-Linear Neural NetworkYao-Chung Hsu,Atsutake Kosuge,Rei Sumikawa,Kota Shiba,Mototsugu Hamada,Tadahiro KurodaThe University of TokyoAbstract In this study,we propose a 13.7mJ/prediction 88%accuracy CI
2、FAR-10single-chip wired-logic processor in 16-nm FPGA by utilizing a newlydeveloped 98%-pruned ultra-sparse,binary-weight nonlinear neuralnetwork(NNN)andashift-registerbasedpipelinedwired-logicarchitecture.Comparedwiththestate-of-the-artFPGA-basedprocessor,2,036 times better energy efficiency is ach
3、ieved.(2/13)Introduction Pace of Energy Efficiency Improvement Slowing Processor Element(PE)Bit Width Already Reduced to 1b Processors Using Only On-chip SRAM Already Realized Power-Hungry SRAM Access also should be eliminated(3/13)Energy table for 45nm CMOSConventional von-Neumann AI ProcessorEnerg
4、y Efficiency TOPS/W2016201720182019202016bit65nm4bit65nm1bit28nm1bit65nm1bit65nm100101102103104YearNo Improvement100 xEnergy Efficiency Trend in ISSCCData inOn-chipSRAMMACunitsOutputDataDRAMWired-logic Architecture Goal:Energy-efficient AI Processor by eliminating the memory access.Ex.Implementing 8
5、8%Acc.CIFAR-10 SNN requires 3,080mm2in 28nm,resulting in 8 TrueNorth chips 4.It requires power-hungry chip-to-chip I/F,resulting in poor energy efficiency.(4/13)Input dataPEOutput dataPEPEPEPE:Processing elementWired-logic AI ProcessorEnergy Consumption ComparisonConv.processor Wired-logicEnergy per
6、 operationTotal:6.8 pJSRAM5pJMul.1.6pJTotal:3.4pJMul.1.6pJConventional wired-logic architectureChip-to-Chip I/FInputOutputChip 1Chip 2Chip-ChipI/F1.7pJOur Research Goal Goal:Single Chip AI Processor with 2,036x Higher Energy Efficiency Using(A)98%pruned ultra-sparse,binary-weight nonlinear neural ne