戴谢宁：MaxCompute索引优化实践分享（19页）.pdf

上传人：云闲

编号：84252

2021-01-01

PDF 19页 3.81MB

《戴谢宁：MaxCompute索引优化实践分享（19页）.pdf》由会员分享，可在线阅读，更多相关《戴谢宁：MaxCompute索引优化实践分享（19页）.pdf（19页珍藏版）》请在三个皮匠报告上搜索。

1、MaxCompute索引优化实践分享阿里云高级专家戴谢宁MaxCompute 2.0MaxCompute的数据模型分区下没有定义数据组织方式。MaxCompute 2.0能否通过定义数据分片、排序和索引提高效率？MaxCompute 2.0哈希分片 Hash ClusteringCREATE TABLE table_nameCLUSTERED BY(col_name,col_name,.)SORTED BY(col_name ASC|DESC)INTO number_of_buckets BUCKETSMaxCompute 2.0区域分片 Range ClusteringCREATE TABL

2、E table_nameRANGE CLUSTERED BY(col_name,col_name,.)SORTED BY(col_name ASC|DESC)MaxCompute 2.0基于索引的查询优化SELECT from table_nameWHERE id 3;MaxCompute 2.0基于索引的查询优化SELECT from table_nameWHERE id 3;MaxCompute 2.0基于索引的查询优化SELECT from table_nameWHERE id=1994-01-01 and l_shipdate=0.05 and l_discount=0.07and l

3、_quantity 24;020406080100120w/clusteringw/o clusteringQuery Execution Time(s)01234567w/clusteringw/o clusteringCPU Usage(cores*minute)01E+092E+093E+094E+095E+096E+097E+098E+09w/clusteringw/o clusteringIO Usage(bytes)MaxCompute 2.0Join优化SELECT t1.id,t1.name,t2.nameFROM t1,t2 WHERE t1.id=t2.id;MaxComp

4、ute 2.0Join优化MaxCompute 2.0TPC-H Q4select o_orderpriority,count(*)as order_count from tpch_orders o join(select distinct l_orderkey from(select*from tpch_lineitem where l_commitdate=1993-07-01 and o.o_orderdate 1993-10-01 group byo_orderpriority order by o_orderpriority limit999999;05010015020025030

5、0350400450w/clusteringw/o clusteringQuery Execution Time(s)0510152025w/clusteringw/o clusteringCPU Usage(cores*minute)051015202530354045w/clusteringw/o clusteringMemory Usage(GB*minutes)MaxCompute 2.0应用实例淘宝交易记录查询基于用户ID查询数日内交易记录。扫描数据量非常大，3TB，400亿条记录。数据选取率非常低。结果集通常为几十到近百条记录。筛选率小于亿分之一。QPS低，但要求低延迟以支持准实

6、时查询场景。原有系统使用全表扫描方式，1100个Worker，需要约2分钟完成。MaxCompute 2.0应用实例淘宝交易记录查询使用Hash Clustering进行改造。以用户ID为主键，对表进行数据哈希切分和排序同样查询只需要4个Mapper，扫描10000条记录，6秒完成。MaxCompute 2.0应用实例淘系交易表增量更新淘系核心交易表需要周期性增量更新。以增量表数据插入或者更新（如果交易记录在全量表中已存在）全量表。类似关系数据库的INSERT UPDAT

戴谢宁：MaxCompute索引优化实践分享（19页）.pdf

相关报告