1、1TRIFP-DCIM:A Toggle-Rate-Immune Floating-point Digital Compute-in-Memory Design withAdaptive-Asymmetric Compute-TreeXing Wang,Tianhui Jiao,Shaochen Li,Yuchen Ma,Zhican Zhang,Zhichao Liu,Xi Chen and Xin SiASP-DAC 2025January 23,20252Challenges and proposed solutionsLarge area overhead of mantissa al
2、ignment Large-scale adder tree limits the energy/area efficiency Less sparsity utilization Current FP-DCIM suffers from:Solutions in this workSerial Shift Scheme for BF16 MACAdaptive Asymmetric Compute-tree(AACT)CircuitTrifectaOne method of increasing data sparsityTRIFP-DCIM proposes:3TRIFP-CIM Over
3、all Architecture Mantissa MAC Block(MMACB)Shift and Accumulate Block(SAB)Exponent Accumulation Block(EAB)Mantissa Shift Block(MSB)Mantissaneeds to be processed and expanded to 9 bits,adding a hidden 1 and a sign bit.4BF16 Computation Dataflow1)Compute 9-bit exponent sums from 8-bit exponents.2)Find
4、maximum exponent using a comparator tree.3)Serial shift mantissa based on control signals.4)Perform multiply-accumulate to get PMACV.5)Integrate PMACV and exponent into BF16 format.5Serial Shift ModuleClock Gating Units&Shift Register Stack(SRS)A SR is made up of 11 serially connected registersHighe
5、st 2 bits-sign bit extension,lower 9 bits-mantissaDeliver 2b of the mantissa to subsequent computation circuit each cycle6Serial Shift Scheme:An Example 1)Given E1=011101010and E2=011100111.2)The counter counts down from 01110101.3)The counter value matches E1 at T0,then first SR starts shifting.4)T
6、he counter value continues to decrease by one at T1 and T2.5)The second SR starts shifting at T3.6)Continue shifting until the specified cycle number is reached.When the counter value aligns with E1,the first SR enters the Find state at T0.7Serial Shift Scheme:An Example 1)Given E1=011101010and E2=0