《邹丹、刘洋-用Apache Flink为字节跳动的HTAP供电.pdf》由会员分享,可在线阅读,更多相关《邹丹、刘洋-用Apache Flink为字节跳动的HTAP供电.pdf(28页珍藏版)》请在三个皮匠报告上搜索。
1、邹丹/刘洋字节跳动基础架构工程师Powering HTAP at Powering HTAP at ByteDanceByteDance withwithApache Apache FlinkFlinkFlinkFlink OLAPOLAP OptimizerOptimizer ImprovementImprovementFlinkFlink OLAPOLAP RuntimeRuntime ImprovementImprovementFutureFuture WWorkork#1#1#2 2#3#3#4#4HTAPHTAP A Architecturerchitecture#1#1HTAPHT
2、AP A ArchitecturerchitectureThe Origin of HTAP1970200920112021RDBMSNoSQLNewSQLMysqlOracelPostgreSQLDb2RedisHBaseCassandraMongoDBSpannerH-StoreHANAAuroraTiDBHyperMemSQLSnappyDataHTAPUpdateBinlogQueryOLTPWhy ByteDance needs HTAPOnlineOffline“Wide”TableHTAPDay/Hour level delayMillisecond level delayHiv
3、eUpdateQueryETLHTAP:Unify OLTP+OLAPMySQLProxyAP EngineCatalogConnectorMetaServiceHTAP StoreOther HTAPComponentsFlink SQLGatewayFlink ClusterWhy we choose Flink SQL as AP engineEngine UnificationEcosystemPerformanceStreaming Batch OLAPAbundant ConnectorTPC-DS OptimizerFlink SQL/Presto/Spark SQL TPC-D
4、S Benchmark05000100001500020000250003000035000Flink 1.11Presto 0.241(before manual tuning)Presto 0.241(after manual tuning)Spark 2.3 on YarnSpark 3.1 StandaloneAll 102 TPC-DS SQLs E2E Execution Time(Seconds)Testing time:2020 Sep 20Data:1T scale TPC-DS on Hive(ORC format)Shorter is fasterHTAP Feature
5、sMillisecond level delayDataDatavisibilityvisibilityAP support read consistencyAPAP ReadReadSnapshotSnapshotSupport all TPC-DS caseStrongStrongSQLSQLShare nothingScalabilityScalabilityFlink SQL Gateway+Session Cluster on K8s Client LayerGo ClientPython ClientJava ClientFlink SQLGatewayFlink SQLGatew
6、ayFlink SQLGatewayFlink SQLGatewayFlink SQLGatewayProxy LayerCompute LayerStorage LayerKubernetesClusterFlink ClusterJMFlink ClusterJMHiveMySQLHTAPESKV StoreRPCChallenges in Flink OLAPMySQLProxyRestServerSessionManagerCommandParserExecutorSet ExecutorExplain ExecutorParseValidateOptimizeFlink Cluste