变体数据类型 - 使半结构化数据变得快速而简单.pdf

编号:167584 PDF 50页 478.61KB 下载积分:VIP专享
下载报告请您先登录!

变体数据类型 - 使半结构化数据变得快速而简单.pdf

1、2024 Databricks Inc.All rights reservedVariant Data TypeVariant Data TypeMaking SemiMaking Semi-Structured Structured Data Fast and SimpleData Fast and SimpleGene Pang,Chenhao LiGene Pang,Chenhao Li20242024-0606-131312024 Databricks Inc.All rights reserved Motivation Variant Data Type Overview Using

2、 Variant Deep Dive:Variant Binary Format Performance2OUTLINEOUTLINE2024 Databricks Inc.All rights reserved Semi-structured data is partially structuredDoesnt fully adhere to relational table modelSchema may be unknown,or incompatible,or evolving JSON is very popular semi-structured data formatFlexib

3、le,and supported in most programming languagesHow do we store and process semi-structured data in the lakehouse?3SemiSemi-Structured Data in the LakehouseStructured Data in the Lakehouse2024 Databricks Inc.All rights reserved On ingestion,read data and infer schema(structs,arrays,scalars,etc.)Read q

4、ueries use the relational schema Performance same as structured/relational data4Schema InferenceSchema InferenceOption 1Option 12024 Databricks Inc.All rights reserved Inference must determine a schema that works with all the dataIf data is diverse,can produce huge,but sparse schemas Schema enforcem

5、ent is strictIncoming data must be compatible with schemaAccessing missing field may produce exceptions5Challenges with Schema InferenceChallenges with Schema InferenceTOO STRICTTOO STRICT2024 Databricks Inc.All rights reserved On ingestion,data is stored as stringNo schema enforcement on ingestion

6、Read queries parse the string during execution Maximum flexibility for any data6Treat Data as StringTreat Data as StringOption 2Option 22024 Databricks Inc.All rights reserved Parsing String in queries is slowTypically,data is read more than it is written,so expensive parsing is repeated for every q

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(变体数据类型 - 使半结构化数据变得快速而简单.pdf)为本站 (张5G) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠