《阿里云开源大数据平台3.0 技术解读.pdf》由会员分享,可在线阅读,更多相关《阿里云开源大数据平台3.0 技术解读.pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、3.0 大数据上云迈入云原生Flink 引领实时化 云原生数据湖新一代的流式湖仓 全面 Serverless 拥抱 AI1.02.03.02009-20192020-20222023阿里云开源大数据平台3.001 Hive 传统数仓缺乏事务能力支持扩展性差查询性能差湖仓(Lakehouse)良好的事务能力支持系统扩展性强查询功能丰富HudiData Lake(OSS/S3)IcebergApache IcebergDelta LakeApache Hudi Paimon4倍Upsert 10倍Scan 5 亿条入湖数据更新和读取场景:Paimon vs Hudi低延时低成本入湖简单开发效率高生
2、态丰富*Benchmark+-Flink+PaimonADSStreaming&BatchStreaming&BatchStreaming&BatchDatabaseLogsbinlogODSDWDDWSPaimon流批一体全链路实时成本低廉数据开放HologresMaxComputeOnline ServingPaimonPaimon Serverless阿里云开源大数据平台3.002 Serverless ECIECS(Virtual Cluster)OSS-HDFSServerlessDLFHMSServerless Flink实时作业开发与运行平台作业资源自动调优作业全生命周期管理智能
3、运维诊断Open API 集成能力动态缩扩容全链路监控报警企业级Flink增强计算引擎细粒度资源分配企业级SQL 算子优化企业级数据集成切换快速故障恢复自研存储状态内核参数与资源动态调整 Apache Flink Flink 2-3 SQL SQL *BenchmarkServerless StarRocksSQL EditorSQLStarRocks Virtual WarehouseBEBEBEData CacheFEFEFEVirtual WarehouseBEBEBEData CacheVirtual WarehouseBEBEBEData CacheData Lake Table F
4、ormatStarRocks Table Format数据湖存储 OSS-HDFS CBOQPS Trino 3 ELT60%Virtual Warehouse SR Manager*BenchmarkServerless SparkServerless Spark CU Native Spark 3 Celeborn PB Shuffle OSS-HDFS DLF PaimonHudiIcebergRBAC JindoCacheApache Celeborn Remote Shuffle ServiceSpark Native Engine Shuffle 数据湖存储 OSS-HDFS*Be
5、nchmark OSS/JindoSDK/HDFS POSIX POSIXSparkFlinkStarRocksJava APIJava APIJindoFS HDFS Serverless HDFSOSS-HDFSHiveTensorflowPyTorch HDFS OSS EB Open 20w QPS 10X du/count HDFS HDFS Spark/Flink/StarRocks*Benchmark AI阿里云开源大数据平台3.003EMR DoctorEMR EMR75%OpenAPI30%PaimonEMREMR on ECSSpark ServerlessStarRock
6、s ServerlessOSS-HDFS*Benchmark Flink 500+30+/Failover Failover HA Checkpoint ENIIPFlink AdvisorAI PipelinesPyxis OSS存储Query NodeQuery NodeQuery NodeQuery NodeScalingWorker NodeCoordinator ServiceQuery CoordinatorData CoordinatorIndex Coordi