《使用 AWS LAMBDA 和 DELTA LAKE 快速、经济、轻松地提取数据.pdf》由会员分享,可在线阅读,更多相关《使用 AWS LAMBDA 和 DELTA LAKE 快速、经济、轻松地提取数据.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、 Fast,Cheap,and Easy Data Ingestion with AWS Lambda and Delta LData Ingestion with AWS Lambda and Delta Lake1/26 About me artists renderingHowdy!My name is R.Tyler CroyI helped create the delta-rs project.I write lots of Rust.I authored a chapter in Delta Lake:The Definitive Guide.I help organizatio
2、ns build cloud-native data platforms.I can help you lower the cost of your Databricks and AWS bills!Data Ingestion with AWS Lambda and Delta Lake2/26 Lets define our Data Ingestion with AWS Lambda and Delta Lake3/26 Delta LakeData storage format which is basically:JSON transaction log filesApache Pa
3、rquet data filesIn AWS we store Delta tables in S3s3:/bucket/delta-table ds=2024-04-01 part-00000-d361a60627e3.c000.snappy.parquet part-00001-5d1872324d6f.c000.snappy.parquet ds=2024-04-02 part-00000-de0b22b62bbd.c000.snappy.parquet part-00001-25f7559cd150.c000.snappy.parquet _delta_log Data Ingesti
4、on with AWS Lambda and Delta Lake4/26 AWS Lambda AWS Lambda is an event-driven,serverless Function as a Service.It is designed to enable developers to run code without provisioning or managing servers.It executes code in response to events and automatically manages the computing resources required b
5、y that code Lambda supports multiple run times but most important are its Python and rust support for ourdiscussionpip install cargo-lambda Time is moneyLambda charges based on:Execution Time:faster is cheaperMemory used:smaller is betterStorage:not really Data Ingestion with AWS Lambda and Delta La
6、ke5/26 Serverless data processingeverything well discuss can be done in other serverless environments so long as they support:event notificationstriggered executionobject storageCould easily be converted to run on Azure or Google Cloud Data Ingestion with AWS Lambda and Delta Lake6/26 Lambdadef lamb