《SESSION 23 AI-Accelerators.pdf》由会员分享,可在线阅读,更多相关《SESSION 23 AI-Accelerators.pdf(417页珍藏版)》请在三个皮匠报告上搜索。
1、ISSCC 2025SESSION 23AI-Accelerators23.1:T-REX:A 68-567s/token,0.41-3.95J/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET 2025 IEEE International Solid-State Circuits Conference1 of 49T-REX:A 68-567s/token,0.41-3.95J/tokenTransformer
2、Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFETSeunghyun Moon1,Mao Li1,Gregory K.Chen2,Phil C.Knag2,Ram Kumar Krishnamurthy2,Mingoo Seok11 Columbia University,New York,NY2 Intel,Hillsboro,OR23.1:T-REX:A 68-567s/token,0.41-3.95J/token Transformer Accel
3、erator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET 2025 IEEE International Solid-State Circuits Conference2 of 49OutlineI.IntroductionII.Algorithmic ApproachesA.Factorizing TrainingB.External Data CompressionIII.Overall ArchitectureIV.Detailed FeaturesA.Hardw
4、are Support for External Data CompressionB.Dynamic BatchingC.Two-Direction Accessible Register FileV.Measurement ResultsVI.Conclusion23.1:T-REX:A 68-567s/token,0.41-3.95J/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET 2025 IEEE Inte
5、rnational Solid-State Circuits Conference3 of 49OutlineI.IntroductionII.Algorithmic ApproachesA.Factorizing TrainingB.External Data CompressionIII.Overall ArchitectureIV.Detailed FeaturesA.Hardware Support for External Data CompressionB.Dynamic BatchingC.Two-Direction Accessible Register FileV.Measu
6、rement ResultsVI.Conclusion23.1:T-REX:A 68-567s/token,0.41-3.95J/token Transformer Accelerator with Reduced External Memory Access and Enhanced Hardware Utilization in 16nm FinFET 2025 IEEE International Solid-State Circuits Conference4 of 49Challenges in Transformer Accelerators Large External Memo