《陈梁-腾讯大数据实时湖仓智能优化实践.pdf》由会员分享,可在线阅读,更多相关《陈梁-腾讯大数据实时湖仓智能优化实践.pdf(24页珍藏版)》请在三个皮匠报告上搜索。
1、DataFunCon#2024腾讯大数据实时湖仓智能优化实践演讲:陈梁-腾讯-级程师Contents目录湖仓架构智能优化服务场景化能力总结与展望湖仓架构HDFSCOSAuto OptimizeServiceSDK API易性 实时性能 查询性能 存储成本 运维成本智能优化服务CompactionServiceClusteringServiceExpirationServiceCleaningServiceIndexServiceAuto EngineService智能优化服务任务调度1.分钟/小时表Streaming 调度持续触发优化配置2.天/周/月表Commit Event事件触发按需触发
2、任务管理1.失败作业监控&自动恢复调度2.优化状态同步用户,明确优化&待优化表详情Compaction Service-old wayColumnarRowColumnarDecompressDecodeColumn to RowRow to ColumnEncodeCompress时效性资源消耗服务稳定性Compaction Service-new wayApache Parquet File Format Structure Magic Number:PAR1Row Group 0Column Column1Page 0Page 1Column Column2Page 0Page 1Row
3、 Group 1Column Column1Page 0Page 1FileMetaData(Version,Schema)FooterRow Group1 metadataColumn Colmn1 metadataType,encoding,offsetsColumn Colmn2 metadataType,encoding,offsetsRow Group2 metadataColumn Colmn1 metadataType,encoding,offsetsCompaction Service-new wayMagic Number:PAR1Row Group 0Column Colu
4、mn1Page 0Page 1Column Column2Page 0Page 1Magic Number:PAR2Row Group 0Column Column1Page 0Page 1Column Column2Page 0Page 1Magic Number:PAR3Row Group 0Column Column1Page 0Page 1Column Column2Page 0Page 1Row Group 1Column Column1Page 0Page 1Column Column2Page 0Page 1Magic Number:PAR1Row Group 0Column C
5、olumn1Page 0Page 1Column Column2Page 0Page 1Magic Number:PAR2Row Group 0Column Column1Page 0Page 1Column Column2Page 0Page 1Page CopyReCompressRowGroupCopy ReCompressCompaction Service-new wayPage Level 小文件合并大文件未开启BloomFilter列未发生变化 RowGroup Level 大文件合并更大文件仅涉及列裁剪仅涉及重新压缩智能优化服务落效果Compaction Service-更多优
6、化增量rewrite策略根基manifest/data file modify time 按时间线切分根据modify time 进行Partition级别增量Delete FilesLEFT ANTI JOIINData Files with Delete Files Delete FilesData Files without Delete Files UNION ALLJOIN POS DF and EQ DF,仅Open1次,避免频繁串行Apply增加Remove Dangling deletes Action,删除残留无引用的DELETE FILES Apply Bloom Inde