《钟阳红-Apache Ballista Introduction.pdf》由会员分享,可在线阅读,更多相关《钟阳红-Apache Ballista Introduction.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、第三届中国Rust开发者大会Apache Ballista Introduction钟阳红(John Zhong)Software Engineer eBaynju_yahoapache.orgAgenda Overview Cluster Setup SQL Execution Data Cache FutureApache Ballista is a distributed SQL query engine powered by the Rust implementation of Apache Arrow and DataFusion.Its mainly for interactive
2、 queries of low latency.Support DAG and fault toleranceSupport data exchangeSupport different kinds of object stores,like HDFS,S3,Azure,etcSupport data cache and cache aware task schedulingOverviewCluster SetupThe cluster consists of one scheduler and a number of executors.Both of scheduler and exec
3、utor can be deployed on K8S.Executors can be added to the cluster flexibly by registering to the cluster scheduler.SQL Execution SQL -DAG(Directed Acyclic Graph)DAG State Machine Task Assignment Event Loop based ProcessingSQL Execution DAG Generation SQLLogical PlanSingle MachineExecution PlanDistri
4、butedExecution PlanDAGSQL Execution DAG State MachineNormal Stage State MachineSQL Execution Fault ToleranceStage State Machine for Executor LostSQL Execution Task AssignmentTask:each execution stage for a number of data partitions.one task for each data partition.Executor slot:each executor has a n
5、umber of slots for task execution.One round task assignment will bind pending tasks with available executor slots as many as possible.Two assignment policies:PolicyResult of One RoundRound-robinJob_a:1 slot from executor_3 1 slot from executor_2Job_b:3 slots from executor_3 2 slots from executor_2 2
6、 slots from executor_1BiasJob_a:2 slots from executor_3Job_b:5 slots from executor_3 2 slots from executor_2SQL Execution Event Loop based ProcessingAdvantages:DecoupledEfficient processing for batch eventsData CacheData cache is a very common feature for the