《基于Lakehouse架构实现湖内建仓实践经验.pdf》由会员分享,可在线阅读,更多相关《基于Lakehouse架构实现湖内建仓实践经验.pdf(17页珍藏版)》请在三个皮匠报告上搜索。
1、基于Lakehouse架构实现湖内建仓实践经验1背景与行业现状2基于Lakehouse湖内建仓参考架构目录3湖内建仓典型场景方案介绍4后续规划数据湖理解的几个误区Wikipeida的定义A data lake is a system or repository of data stored in its natural/raw format,usually object blobs or files.A data lake is usually a single store of all enterprise data including raw copies of source system
2、 data and transformed data used for tasks such as reporting,visualization,advanced analytics and machine learning.A data lake can include structured data from relational databases(rows and columns),semi-structured data(CSV,logs,XML,JSON),unstructured data(emails,documents,PDFs)and binary data(images
3、,audio,video).A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value.AWS定义A data lake is a centralized repository that allows you to store all your structured and unstructured data at any scale.You can store your data as-
4、is,without having to first structure the data,and run different types of analyticsfrom dashboards and visualizations to big data processing,real-time analytics,and machine learning to guide better decisions.Azure定义Azure Data Lake includes all the capabilities required to make it easy for developers,
5、data scientists,and analysts to store data of any size,shape,and speed,and do all types of processing and analytics across platforms and languages.It removes the complexities of ingesting and storing all of your data while making it faster to get up and running with batch,streaming,and interactive a
6、nalytics.Azure Data Lake works with existing IT investments for identity,management,and security for simplified data management and governance.It also integrates seamlessly with operational stores and data warehouses so you can extend current data applications.Weve drawn on the experience of working