《基于 GPU 的数据科学与分析加速库 RAPIDS- 概览与更新.pdf》由会员分享,可在线阅读,更多相关《基于 GPU 的数据科学与分析加速库 RAPIDS- 概览与更新.pdf(68页珍藏版)》请在三个皮匠报告上搜索。
1、NVIDIA基于GPU的数据科学与分析加速库RAPIDS概览与更新Sheng Luo#page#WHY GPUS?Numerous hardware advantagesNVIDIADGX A100 System口Thousands of cores with up to -20 TeraFlops of generalpurpose compute performanceUp to 1.5 TB/s of memory bandwidthD皖D0Hardware interconnects for up to 600 GB/s bidirectionalGPU GPU bandwidthCa
2、n scale up to 16x GPUs in a single nodeAlmost never run out of compute relative to memorybandwidth!ZnVIDIA#page#DATA PROCESSING EVOLUTIONFaster Data Access,Less Data MovementHadoop Processing, Reading from DiskHDFSHDFSHDFSHDFSHDFSQuery113ML TrainReadWriteReadWriteRead25-100xImprovementSpark In-Memor
3、y ProcessingLess CodeLanguage FlexiblePrimarily In-MemoryHDFS113QueryML TrainRead5-10x ImprovementTraditional GPU ProcessingMore CodeLanguage RigidCPUCPUSubstantially on GPUHDFSGPUGPUGPLML113WritWrit2uerReadReadReaReaTrain#page#DATA MOVEMENT AND TRANSFORMATIONThe Bane of Productivity and Performance
4、Read DataAPBAPPB100088828CPtAoBCopy&ConvertGPUCPU20Copy&ConvertGPUAPPAAPPALoadData#page#DATA MOVEMENT AND TRANSFORMATIONWhat if We Could Keep Data on the GPU?Read DataAPBAPPB100088828CPtXGPUCPU20opyXConvertGPUAPPAAPPALoadData#page#LEARNING FROM APACHE ARROWPandasIuOSparkParguetHBaseParauetHBaeCassan
5、draKuduCassandraKuduEach system has its own internal memory formatALI systems utilize the same memory format70-80%computationwasted on serialization andNo overhead for cross-system communicationdeserializationProjects can share functionality (eg,Parquet-to-Arrowreader)Similarfuncltiipleprojects#page
6、#DATA PROCESSING EVOLUTIONFaster Data Access,Less Data MovementHDFSHDFSHDFSHDFSHDFS113MLTrainQueryWriteReadReadReadWrite25-100xImprovementLess CodeLanguage FlexibleHDFSPrimarily ln-MemoryML TrainQueryRead5-10x ImprovementTraditional GPU ProcessingMore CodeLanguage RigidCPUCPUGPUGPUHDFSSubstantiatly