《介绍适用于 Apache Spark™ 的新 Python 数据源 API.pdf》由会员分享,可在线阅读,更多相关《介绍适用于 Apache Spark™ 的新 Python 数据源 API.pdf(54页珍藏版)》请在三个皮匠报告上搜索。
1、This information is provided to outline Databricks general product direction and is for informational purposes only.Customers who purchase Databricks services should make their purchase decisions relying solely upon services,features,and functions that are currently available.Unreleased features or
2、functionality described in forward-looking statements are subject to change at Databricks discretion and may not be delivered as planned or at allProduct safe harbor statement2024 Databricks Inc.All rights reservedIntroducing the Introducing the New Python Data New Python Data Source API in Source A
3、PI in Apache SparkApache SparkAllison WangAllison WangSr.Software Engineer,DatabricksSr.Software Engineer,Databricks2Ryan Nienhuis,Ryan Nienhuis,Sr.Staff Product Manager,DatabricksSr.Staff Product Manager,Databricks2024 Databricks Inc.All rights reserved2024 Databricks Inc.All rights reservedIntrodu
4、ctionWhy Python Data Source API?Deep Dive into the Python Data Source APIDemoData Source ReaderData Source WriterStreaming APIsQ&A3AgendaAgendaExploring the new Python Data Source API in Apache SparkExploring the new Python Data Source API in Apache Spark2024 Databricks Inc.All rights reserved2024 D
5、atabricks Inc.All rights reservedCustom Integrations in SparkCustom Integrations in Spark41.Use ForEachBatch/ForEach for streaming workloads 2.Build a custom integration in Scala/Java using the DataSource V2 API3.Dont build one;get the data in Delta using a custom app4.Import a libraryYou have a cou
6、ple optionsWhich have some drawbacksYou have a couple optionsWhich have some drawbacks1.ForEachBatch code is powerful but very hard to write well2.Flexible but no API for Python developers3.Added cost and latency copying data4.Not optimized for SparkHow do I simply read and write data?How do I simpl