如何在 GPU 上进行海量数据流的 ETL 处理.pdf

编号:29491 PDF 42页 3.71MB 下载积分:VIP专享
下载报告请您先登录!

如何在 GPU 上进行海量数据流的 ETL 处理.pdf

1、#page#“ Winners are those who went throughiterations of the “loop ofmoreprogress- going from an idea, to itsimplementation, to actionable results.So the winning teams are simply thoseable to run through this loop faster.”Francois Chollet, creator of Keras#page#DATA SCIENTIST WORKFLOWThe Average Data

2、 Scientist Spends 90+% of Their Time in ETL as Opposed to Training Modelsanother*#!forgot toaddafeaturerestartETLworkfow12start ETLworkfloCPUswitch to decaPOWEREDcomtlgunWORKFLOWdatasetcollectiondatasetdownloadanalysisvemight143traininference#page#page#GPU-ACCELERATED FEATURE ENGINEERINGResults from

3、 ACM RecSys Challenge 2020 Winners口otherxgbtrain&predictlag featurescount encodingtargetencoding4000357030002000销10201000270138IntelXeonCPU(20cores)1xV1004xV1004xV100+UCXComputation time in seconds for different infrastructureand librariessource:https:/ FEATURE ENGINEERINGIndustry Standard Benchmark

4、: Up to 350x faster queries; Hours to Seconds!10TB ResulsRAPIDS Runningon 16 NVIDIADGXA10OSIndustrystandard data science benchmark consisting of 30end-to-endqueries representing real-world ETLand Machine Learning workflows,involving both structured and unstructured data.ltwas run at twodifferent dat

5、asizes.1TB10TBRAPIDS results at 1TB(2 DGXA100s)and10TB(16DGXA100s)showlarge-scale data analyticsproblemsTotal TirneRAPD3LCa0CT1TB:37.1xaveragespeed-up10TB:19.5xaveragespeed-up(7x Normalizedfor Cost)CuPyblazingsQLNumbaDASK#page#WHY GPUS FOR ETLNumerous HardwareAdvantagesNVIDIADGXA100 SystemDThousands

6、 of cores with up to-20 TeraFlops of general-purpose compute performanceUp to 1.5 TB/s of memory bandwidthDD口Hardware interconnects for up to 600 GB/s bidirectionalGPU GPU bandwidthCan scale up to 8x GPUs in a single nodeAlmost never run out of compute relativeto memory bandwiidth!#page#RAPIDSEnd-to

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(如何在 GPU 上进行海量数据流的 ETL 处理.pdf)为本站 (X-iao) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠