《黄兴勃-基于FFI的PyFlink下一代Python运行时介绍.pdf》由会员分享,可在线阅读,更多相关《黄兴勃-基于FFI的PyFlink下一代Python运行时介绍.pdf(20页珍藏版)》请在三个皮匠报告上搜索。
1、黄兴勃(断尘)-Apache Flink Committer 阿里巴巴高级开发工程师基于基于 FFI FFI 的的 PyFlinkPyFlink 下一代下一代 Python Python 运行时介绍运行时介绍PyFlinkPyFlink最新功能最新功能PyFlinkPyFlinkRuntimeRuntime基于基于FFIFFI的的JCPJCPPyFlinkPyFlinkRuntimeRuntime 2.02.0FutureFutureWorkWork#1#2#3#4#5#1#1PyFlinkPyFlink最新功能最新功能PyFlinkPyFlink 1.141.14的新功能的新功能#1性能性能
2、Operator FusionState 序列化/反序列化优化Finish Bundle优化功能功能State TTL config易用性易用性支持上传tar.gz依赖包ProfilePrintLocal Debug#2#3#2 2PyFlinkPyFlink RuntimeRuntimePyFlink Architecture OverviewPython Table API&SQLPython DataStreamAPIPy4JTable API&SQL(Declarative)DataStream API(Imperative)CommonRulesOptimizerPythonRule
3、sJobGraphJavaOperatorsPythonOperatorsRuntimeJavaOperatorsPythonOperatorsDataServiceStateServiceDataServiceStateServiceUDFPyFlink RuntimeJava OperatorPython Workercheckpoint handlingwatermarkhandlingstaterequesthandlingJVMPVMPyFlink Runtime WorkFlow性能瓶颈性能瓶颈1.计算(Call UDF 环节的耗时)3.通信(JVM和PVM的进程间通信)2.序列化
4、/反序列化(输入数据和UDF返回结果)codegen functioncython自定义序列化器generatorcythonJava/PythonJava/Python互相调用的问题互相调用的问题#3 3基于基于FFIFFI的的JCPJCPJava/PythonJava/Python互相调用的方案互相调用的方案#1基于基于FFIFFI的方案的方案IPCIPC通信方案通信方案PythonPython运行在运行在JVMJVM的方案的方案#2#32.共享内存 Shared MemoryPySpark Runtime py4jPyFlink&PySpark ClientAlink Runtime s
5、ocketTensorflow On FlinkPyArrow Plasma1.将Python 转成 Java p2j voc2.基于Java实现的Python解释器 Jython Graalvm grpcPyFlink On BeamIPCIPC性能问题性能问题1.网络通信兼容性问题兼容性问题FFIFFI#1什么是什么是FFIFFIA foreign function interface(FFI)is a mechanism by which a program written in one programming language can call routines or make use
6、 of services written in another.This can be done in several ways:Requiring that guest-language functions which are to be host-language callable be specified or implemented in a particular way,often using a compatibility library of some sort.Use of a tool to automatically wrap gue