1、AHolisticFPGAArchitectureExplorationFramework for Deep Learning AccelerationJiadong Zhu,Dongsheng Zuo,Yuzhe MaJanuary 23,2025The Hong Kong University of Science and Technology(Guangzhou)Table of ContentsFPGA Architecture Through A DL LensPrevious Work on Improving FPGA ArchitecturesFPGA Architecture
2、 Exploration Framework OverviewMulti-objective FPGA Architecture SearchExperiments1FPGA Architecture Through A DL LensAccelerator Market TrendsThe FPGA accelerators are expected to grow steadily over the forecastperiod.11Grand View Research,Data center accelerator market size,share&trends analysis r
3、eport by processor(cpu,gpu,fpga,asic),2024.Online.Available:https:/ between FPGA and Other PlatformsFPGAs occupy an intermediate position on the spectrum of efficiencyversus programmability,striking a unique balance in DL acceleration3FPGA Architecture OverviewCLBDSPBRAMOther custom blocksEarlyFPGA
4、architectureModern heterogeneous FPGA architectureBlocks and their strength for DL4Strength:Flexible Precision&Efficient Computing ImplementationCLBMost numerousCan program to realize hardware of any bit width Use lowest precision that meets accuracy for each network/layerProgrammable routing:direct
5、ly wire data from one unit to anotherProgrammable logic:perform only necessary operation5Strength:Hard Blocks&Low Latency MemorySource:Vaughn Betzs slides of the tutorial on Deep Learning-Optimized FPGA Architecturesat MICRO 2022Hard blockDSP:designed to speed up multiplyaccumulate(MAC)operationsMas
6、sive bandwidth BRAMPb/s of on-chip bandwidth(in a large chip)little or no batchingGPUs batch inputs to amortize weight re-loading latency increase6How to Make FPGA Architecture More Suitable for DL Acceleration?Existing FPGA architectures are not designed specifically for DLworkloadsCLBDSPBRAMOther