《让我们用 RUST 和 DELTA LAKE 做一些数据工程吧!.pdf》由会员分享,可在线阅读,更多相关《让我们用 RUST 和 DELTA LAKE 做一些数据工程吧!.pdf(28页珍藏版)》请在三个皮匠报告上搜索。
1、 Data Engineering with Rust and Delta LData engineering with Rust and Delta Lake1/28 About me artists renderingHowdy!My name is R.Tyler CroyI helped create the delta-rs project.I write lots of Rust.I authored a chapter in Delta Lake:The Definitive Guide.I help organizations build cloud-native data p
2、latforms.I can help you lower the cost of your Databricks and AWS bills!Data engineering with Rust and Delta Lake2/28 Lets define our Data engineering with Rust and Delta Lake3/28 Delta LakeData storage format which is basically:JSON transaction log filesApache Parquet data filesIn AWS we store Delt
3、a tables in S3s3:/bucket/delta-table ds=2024-04-01 part-00000-d361a60627e3.c000.snappy.parquet part-00001-5d1872324d6f.c000.snappy.parquet ds=2024-04-02 part-00000-de0b22b62bbd.c000.snappy.parquet part-00001-25f7559cd150.c000.snappy.parquet _delta_log Data engineering with Rust and Delta Lake4/28 De
4、lta Lakecat deltatbl-partitioned/_delta_log/Data engineering with Rust and Delta Lake5/28 Rust Rust is a multi-paradigm,general-purpose programming language that emphasizes performance,type safety,and concurrency.It enforces memory safetymeaning that all references point to valid memorywithout a gar
5、bage collector there are a lot of different ways to use rest for data engineering and processing but the bigreason we want it is because it allows us to correctly Implement high performance programs with less Data engineering with Rust and Delta Lake6/28 our toolsarrowdeltalakedatafusionand Data eng
6、ineering with Rust and Delta Lake7/28 our tools:arrowarrow is the foundation for almost all consequential data processing in Rust.the big things that the arrow-rs project gives us are the in-memory columnar data representation ofRecordBatch and a parquet reader/writer librarylet arrow_array:VecArc=v