如何在 GPU 上进行海量数据流的 ETL 处理.pdf

上传人： li

编号：29491

2021-02-07

PDF 42页 3.71MB

《如何在 GPU 上进行海量数据流的 ETL 处理.pdf》由会员分享，可在线阅读，更多相关《如何在 GPU 上进行海量数据流的 ETL 处理.pdf（42页珍藏版）》请在三个皮匠报告上搜索。

1、#page#“ Winners are those who went throughiterations of the “loop ofmoreprogress- going from an idea， to itsimplementation， to actionable results.So the winning teams are simply thoseable to run through this loop faster.”Francois Chollet, creator of Keras#page#DATA SCIENTIST WORKFLOWThe Average Data

2、 Scientist Spends 90+% of Their Time in ETL as Opposed to Training Modelsanother*#！forgot toaddafeaturerestartETLworkfow12start ETLworkfloCPUswitch to decaPOWEREDcomtlgunWORKFLOWdatasetcollectiondatasetdownloadanalysisvemight143traininference#page#page#GPU-ACCELERATED FEATURE ENGINEERINGResults from

3、 ACM RecSys Challenge 2020 Winners口otherxgbtrain&predictlag featurescount encodingtargetencoding4000357030002000销10201000270138IntelXeonCPU（20cores）1xV1004xV1004xV100+UCXComputation time in seconds for different infrastructureand librariessource:https:/ FEATURE ENGINEERINGIndustry Standard Benchmark

4、: Up to 350x faster queries； Hours to Seconds!10TB ResulsRAPIDS Runningon 16 NVIDIADGXA10OSIndustrystandard data science benchmark consisting of 30end-to-endqueries representing real-world ETLand Machine Learning workflows，involving both structured and unstructured data.ltwas run at twodifferent dat

5、asizes.1TB10TBRAPIDS results at 1TB（2 DGXA100s)and10TB（16DGXA100s）showlarge-scale data analyticsproblemsTotal TirneRAPD3LCa0CT1TB:37.1xaveragespeed-up10TB:19.5xaveragespeed-up（7x Normalizedfor Cost）CuPyblazingsQLNumbaDASK#page#WHY GPUS FOR ETLNumerous HardwareAdvantagesNVIDIADGXA100 SystemDThousands

6、 of cores with up to-20 TeraFlops of general-purpose compute performanceUp to 1.5 TB/s of memory bandwidthDD口Hardware interconnects for up to 600 GB/s bidirectionalGPU GPU bandwidthCan scale up to 8x GPUs in a single nodeAlmost never run out of compute relativeto memory bandwiidth！#page#RAPIDSEnd-to

如何在 GPU 上进行海量数据流的 ETL 处理.pdf

相关报告