1、Next generation of Tencent OLAP EngineTencent TEG LongYueCONTENTS01Background02Storage:Various Columns and Indexes04Benchmark and Applications03Computation:Integrated with PrestoBackgroundMercsDBlData:1.Thousand Columns,10 Billions Rows2.Index on arbitrary column(s)3.Real-Time Write4.Both Row-orient
2、ed and Column-oriented5.Different IndexeslPerformance:1.Second response on 10 Billions rows query2.MPP:both ad-hoc query and real-time query3.Real-time Write:100 Billion rows/day!=ElasticSearchPresto/ImpalaClickHouseHistoryMercsDBHermes 1.0Hermes 2.0Hermes 3.02013.12016.122019.62021.12Mass DataReal-
3、TimeBI:Use PicturesInverted IndexBasic OLAPLog AnalysisIntroduce SparkFull OLAPAdsIntroduce PrestoVectorizationFocus on PerformanceCurrent StatusClusters5k+NodesQuery10M/DayStorageTotal:100 PBDaily:1PBPeak IO100M rows/sCONTENTS01Background02Storage:Various Columns and Indexes04Benchmark and Applicat
4、ions03Computation:Integrated with PrestoBasic ArchLocalFSCompute EngineDataStoreWorkerQueryExecutionEngineDataReaderHDFSOZoneCEPHDistricted vs LocalHA vs Low LatencyHot Data vs Cold DataReplica managementAuto Disaster ToleranceIsomeric ArchLow LatencyLoss ServiceLRUCacheMMAPColumn-Oriented&IndexQ1:H
5、igh QPS?Q2:Second response for 10Bilion rows?Q3:Cost-friendly for mass data?lColumns:1.Retrieval Low latency2.Sorted Mass Data3.Compressed Cost-Friendly4.Nested Support parquetlIndexes:1.SpareIndex2.SkipListIndex3.InvertedIndex4.KeyIndex5.LBSIndexRetrieval ColumnlApplication1.Low latency2.Medium dat
6、a3.Simple QuerylImplementation1.Storage-Time2.Dictionary IndexIndex Size/Origin Data=40%Sorted ColumnlApplication1.Mass Data(much bigger than memory)2.No other accelerationslImplementation1.Sorted2.Index both on data and offsetImprovement:10 x speed upCompressed ColumnlApplication1.High Cardinal2.Di