《HC2022.NTT.kenji_tanaka.v1.pdf》由会员分享,可在线阅读,更多相关《HC2022.NTT.kenji_tanaka.v1.pdf(16页珍藏版)》请在三个皮匠报告上搜索。
1、VTA-NIC:Deep Learning Inference Serving in Network Interface CardsKenji Tanaka1,Yuki Arikawa1,Kazutaka Morita2,Tsuyoshi Ito1,Takashi Uchida3,Natsuko Saito3,Shinya Kaji3,Takeshi Sakamoto11NTT Device Technology Labs,2NTT Software Innovation Center,3Fixstars CorporationHotchips 34 Posters,August 21-23,
2、2022,Virtual Conference NTT Ltd.All Rights Reserved.1Abstract:VTA-NIC Chip ArchitectureWe aim to achieve DL inference serving(DLIS)without CPU interference.We integrate hardware data paths as a NIC(Network Interface Card),a REST API parser/deparser and multiple VTAs(Versatile Tensor Accelerators).Ho
3、tchips 2022 PostersVTA-NIC:Deep Learning Inference Serving in Network Interface Cards NTT Ltd.All Rights Reserved.2ConfigurationVTA-NICProcess node16 nm FinFETXilinx FPGANumber of Cores8 VTA CoresCore Frequency213 MHzMACs per core169Memory Throughput19.2GB/s(DDR4-2400)Number precision INT8Abstract:P
4、erformancePower EfficiencyThe DLIS power efficiency of VTA-NIC is 6.1x better than that of GPU(Nvidia V100).Tail LatencyAt high load,the tail latency of heterogeneous systems unexpectedly increases.With our chip,the tail latency is predictable since it is proportional to the load.3Hotchips 2022 Post
5、ersVTA-NIC:Deep Learning Inference Serving in Network Interface Cards NTT Ltd.All Rights Reserved.Recently,web applications are often built on microservices.DL Inference Serving(DLIS)is one of those microservices1.DLIS is provisioned with a special accelerator instance2.The microservices/instances a
6、re loosely coupled via APIs.Background4CPUGPUHotchips 2022 PostersVTA-NIC:Deep Learning Inference Serving in Network Interface Cards NTT Ltd.All Rights Reserved.Accelerator instances risk inefficient data movement.1.Moving data via host processors decreases the accelerators utilization.3a.In our pre