1、Greenplum 6:混合负载的理想数据平台高小明全球领先的开源MPP大数据平台 可可扩扩展性展性 ACID事事务务VS 分布式分布式 简单简单易用易用VS结结构化构化 半半结结构非构非结结构化构化VS 事事务务型型 分析型分析型VSMPP-massively parallel processing-大大规规模并行模并行处处理理masterstandbyprimary segmentmirror segment6Pivotal ConfidentialInternal Use Only数据分布:并行化的根基最重要的策略和目标是均匀分布数据到各个数据节点。43Oct 20 20051264Oc
2、t 20 200511145Oct 20 20054246Oct 20 20056477Oct 20 20053248Oct 20 20051250Oct 20 20053456Oct 20 200521363Oct 20 20051544Oct 20 200510253Oct 20 20058255Oct 20 200555CREATE TABLE orders(id serial,order_date timestamp)Distributed by(id);7Pivotal ConfidentialInternal Use OnlySELECT customer,amount FROM
3、orders JOIN customer USING(cust_id)WHERE date=2008;生成并行查询计划8Pivotal ConfidentialInternal Use Only执行并行计划StandbyMasterMasterHostInterconnectSegment HostNode1Segment HostNode2Segment HostNode3Segment HostNodeNGreenplum(MPP)Oracle(SMP)OLAP-Online Analytical Processing-联联机分析机分析处处理理Gartner 2019数据分析行业报告Piv
4、otal Greenplum scored highly this year in all four use cases,positioning among the top vendors in all bar the context-independent data warehouse use cases.This reflects one of the major trends in the DMSA market this year:rediscovery.End users are turning to traditional technologies in order to meet
5、 their DMSA requirements,and Pivotal Greenplums strong capabilities here as an MPP relational database are well-showcased12Pivotal ConfidentialInternal Use Only卓越的OLAP特性列式存列式存储储分区、压缩高高级级特性特性递归查询、窗口函数集成分析集成分析多格式、多语言Madlib:机器学机器学习习数据库内并行模型训练和预测、分类ORCA复杂查询优化器成熟成熟稳稳定定完备生态、支撑核心生产系统13Pivotal ConfidentialI
6、nternal Use Only列式存储表SALES表SALES更适合压缩查询部分列时速度快不同列可以使用不同压缩方式amountcust_id表 orders14Pivotal ConfidentialInternal Use OnlySegment 1ASegment 1BSegment 1CSegment 1DSegment 2ASegment 2BSegment 2DSegment 3ASegment 3BSegment 2CSegment 3CSegment 3D分区SELECT COUNT(*)FROM orders WHERE order_date=Oct 1 2007 AND