当前位置:首页 > 报告详情

4228 - 麸质 GPU.pdf

上传人: 竿*** 编号:982940 2025-11-29 7页 214.67KB

1、Gluten GPU backendVelox LibCUDF A Velox DriverAdapter is used to replace CPU operators with GPU operators that call cuDFs C+code.This DriverAdapter can be registered at startup by the application using Velox to enable the cuDF backend.Each operator will have a cuDF equivalent.For example,OrderBy wil

2、l be replaced by CudfOrderBy.In between CPU operators and GPU operators,another conversion operator is inserted to handle CPU-GPU and GPU-CPU data movement.This allows cuDF operators to be used alongside existing Velox operators.The conversion currently uses Arrow(Velox to Arrow,then Arrow to cuDF).

3、A direct Velox-to-cuDFinterop without Arrow may be built in the future for higher performance.Currently,no custom CUDA kernels are needed for this code.All functionality is implemented in pure C+calling cuDF,which implements the CUDA kernels.There is a lot more to say about tuning for performance(GP

4、U batch sizes,CUDA streams,number of Velox drivers,.)but Im leaving that out of this document for the moment.Link:Experimental RAPIDS cuDF Backend for Velox#12412Wave:Velox on CUDA Experimental subproject in Velox to support GPU The main logics are in Velox,only basic CUDA API usedgpu:common(memory,

5、event)CUDA APIHashtableBuild()HashtableProbe()SQL OperatorsVeloxCUDAGluten ArchitectureVelox GPU backend Validation,cudf only supports some operators Operator implement,now it copies all the operator implement,and rewrite by cudf,How to sync with veloxMostly copy acceptBehavior difference Spill supp

6、ort Memory cuda global pool cudf:detail:cuda_stream_pool Conversion Cudf to arrow conversion,cache all the velox vector,combine to a big vector and then convert to arrow,then to cudf table Insert format conversion when 1 operator not supported,maybe table scan

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
根据标记内容,全文主要介绍了Velox GPU后端的使用和实现细节: 1. 使用DriverAdapter替换CPU操作符为GPU操作符,调用cuDF的C++代码。 2. 每个操作符都有cuDF的对应版本,如OrderBy对应CudfOrderBy。 3. 插入转换操作符处理CPU到GPU和GPU到CPU的数据移动。 4. 转换过程目前使用Arrow,未来可能直接实现Velox到cuDF的互操作。 5. 无需自定义CUDA内核,所有功能由纯C++实现,调用cuDF的CUDA内核。 6. 性能调优涉及GPU批大小、CUDA流、Velox驱动器数量等。 7. 支持CUDA API,如gpu::common、Hashtable等。 8. 需要注册cuDF以启用后端,注册方式为`void registerCudf();`。 9. 支持转换和验证Substrait计划,以及Velox计划到cuDF节点的转换。 10. 输出格式转换,如不支持的操作符可能需要经过转换。 11. 支持编码,CudfVector扩展RowVector但不完全相同。 12. 提供相关文档链接。
Velox与cuDF的完美融合?" cuDF如何助力性能提升?" Velox的cuDF驱动适配器揭秘!"
客服
商务合作
小程序
服务号
折叠