1、Computer Architecture and Memory systems LaboratoryCAMELCAMELabab23Graph40.1 0.810.2010.8 0.71010.110.8 0.110.200.4 0.810.1 0.2 0.8 0.200.40.2 0.3 0.2 0.8 0.5 0.4 0.6 0.9 0.50.4 0.8 1 0.1 0.2 0.8 0.2 0 0.40.2 0.3 0.2 0.8 0.5 0.4 0.6 0.9 0.50.10.8 1 0.2 0 1 0.8 0.7 1MLP50.4 0.8 1 0.1 0.2 0.8 0.2 0 0.
2、40.2 0.3 0.2 0.8 0.5 0.4 0.6 0.9 0.50.10.8 1 0.2 0 1 0.8 0.7 1MLP0.1 0.810.2010.8 0.71010.110.8 0.110.200.4 0.810.1 0.2 0.8 0.200.40.2 0.3 0.2 0.8 0.5 0.4 0.6 0.9 0.567891101010100111001001111110101010010OOMGraph size(#of edges)1112FPGADRAMDRAMDRAMDRAMFPGANVMe SSDDRAMCTRLNANDFlashNANDFlashNANDFlashN
3、ANDFlash13FPGADRAMDRAMDRAMDRAMFPGANVMe SSDDRAMCTRLNANDFlashNANDFlashNANDFlashNANDFlash14FPGADRAMDRAMDRAMDRAMFPGA15FPGADRAMDRAMDRAMDRAMFPGA16FPGADRAMDRAMDRAMDRAMFPGAOcta-coreCore0Core1 Core2Core3Core4Core5Core6Core7Many SAsSystolicarraySystolicarrayCore0HeteroVectorprocessorSystolicarrayCore017FPGADR
4、AMDRAMDRAMDRAMFPGA18FPGADRAMDRAMDRAMDRAMFPGA19FPGADRAMDRAMDRAMDRAMFPGA20FPGADRAMDRAMDRAMDRAMFPGANVMe SSDDRAMCTRLNANDFlashNANDFlashNANDFlashNANDFlash14nm FPGASSD21RTX 1060GTX 3090HolisticGNN100.4x fasterLarge graphSmall graph:1.69x22RTX 1060GTX 3090HolisticGNNSmall graphLarge graph453.2x lowerDue to
5、low-power computing of FPGA33.2x and 16.3x better than GTX 3090,RTX 106023Demonstration Video Link:https:/ HolisticGNNHigh-Performance GPU24Original publication:M.Kwon,D.Gouk,S.Lee,and M.Jung.USENIX FAST 2022.Hardware/Software Co-Programmable Framework for Computational SSDs to Accelerate Deep Learning Service on Large-Scale Graphs(https:/www.usenix.org/system/files/fast22-kwon.pdf)Acknowledgment:This research is supported by Samsung Research Funding&Incubation Center of Samsung Electronics(SRFC-IT2101-04).Myoungsoo Jung is the corresponding author.