1、3D-METRO:Deploy Large-Scale Transformer Model on a Chip Using Transistor-Less 3D-Metal-ROM-Based Compute-in-Memory MacroYiming Chen*,Xirui Du*,Guodong Yin,Wenjun Tang,Yongpan Liu,Huazhong Yang,and Xueqing Li11Department of Electronic Engineering,LFET/BNRist,Tsinghua UniversityEmail:*These authors eq
2、ually contributed to this workOutline Background Motivation Proposed Design Benchmark Conclusion2Outline Background Motivation Proposed Design Benchmark Conclusion3Background Large language models(LLM)based on the Transformer(A.Vaswani et al.,2017)are blooming in both CV and NLP.Furthermore,multimod
3、al large model,such as GPT-4(J.Achiam et al.,2023)and MiniGPT-4(D.Zhu et al.,2023),performs human-like AI in various applications.Input EmbeddingMulti-HeadAttentionFeedForwardTextPredictionMatMulScaleMaskSoftMaxQKMatMulVTransformerLLMImageEncoderAudioEncoderBackground SRAM-CiM,especially high-densit
4、y designs,enables the in-memory attention mechanism.However,in short-sequence scenarios such as the edge side,the limited on-chip density still results in a serious weight-dumping overhead.SizeResNet-34AlexNetBERTMiniGPT-4Year100M500M4B20122016202020241BOn-chip SRAM trendModel growingComputingWeight
5、 LoadingAct AccessingEnergy BreakdownImprovement w/Full DeploymentS M LS M LS M LS M LImage Size:S:32x32M:512x512 L:1080PSequence Length:S:12M:128L:512200202Background Recently,a high-density ROM-CiM(G.Yin et al.,2023)has been proposed to address the limited density challenges of SRAM-CiM.By introdu
6、cing SRAM-CiM as finetuning weights,YOLoC and Hidden-ROM(Y.Chen et al.,2022)are proposed to release the bottleneck of flexibility issue.6Domain AEmptyDomain BInput driverADCsSRAMarrayLCCSRAMarrayLCCDomain APretrainedDomain BInput driverADCsSRAMarrayROMarrayLCCDomain ARandomDomain BInput driverADCsSR