《探索 PYSPARK 中的 UDTF(用户定义表函数).pdf》由会员分享,可在线阅读,更多相关《探索 PYSPARK 中的 UDTF(用户定义表函数).pdf(18页珍藏版)》请在三个皮匠报告上搜索。
1、2024 Databricks Inc.All rights reserved1EXPLORING EXPLORING UDTFS IN UDTFS IN PYSPARKPYSPARKTakuya Ueshin,Haejoon LeeTakuya Ueshin,Haejoon LeeData+AI Summit 2024Data+AI Summit 20242024 Databricks Inc.All rights reserved2IntroductionsIntroductionsTakuya UeshinSr.Software Engineer DatabricksHaejoon Le
2、eSoftware Engineer Databricks2024 Databricks Inc.All rights reservedWhat are UDTFs?Capabilities and areas for improvement in UDTFsIntroduction to polymorphismMaking UDTFs polymorphicExample/demoConclusionAgendaAgenda32024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reserved4WHAT
3、 ARE UDTFsWHAT ARE UDTFs(User(User-DefinedDefinedTableTableFunctions)?Functions)?2024 Databricks Inc.All rights reserved Definition:UDTFs(User-Defined Table Functions)in PySpark allow for custom data processing functions that return tables.Components:UDTF class eval function terminate functionWhat a
4、re UDTFs?What are UDTFs?5Available in Spark 3.5Available in Spark 3.5UDTF classeval()terminate()1 rows0 rows2024 Databricks Inc.All rights reservedPYTHONUDTF Class and eval FunctionUDTF Class and eval Function6Standard UDTFStandard UDTFfrom pyspark.sql.functions import udtfudtf(returnType=num:int,sq
5、uared:int)class SquareNumbers:def eval(self,start:int,end:int):for num in range(start,end+1):yield(num,num*num)2024 Databricks Inc.All rights reservedPYTHONUDTF Class and eval FunctionUDTF Class and eval Function7Standard UDTFStandard UDTFfrom pyspark.sql.functions import udtfudtf(returnType=num:int
6、,squared:int)class SquareNumbers:def eval(self,start:int,end:int):for num in range(start,end+1):yield(num,num*num)2024 Databricks Inc.All rights reservedPYTHONUDTF Class and eval FunctionUDTF Class and eval Function8Standard UDTFStandard UDTFfrom pyspark.sql.functions import udtfudtf(returnType=num: