当前位置:首页 > 报告详情

通过探索Apache Iceberg的力量来进行数据协调.pptx

上传人: 王** 编号:171089 2024-07-23 45页 13.99MB

1、|Copyright 2023,InfluxData1Navigating Data Harmony by Exploring the Power of Apache IcebergZoe Steinkamp|Copyright 2023,InfluxData2Agenda Introduction to Apache Iceberg Why it was built+How it works Key Benefits of Apache Iceberg Migration+Integrations Use Cases Why InfluxDB is using Iceberg Resourc

2、es|Copyright 2023,InfluxData3Introduction to Apache Iceberg3|Copyright 2023,InfluxData4|Copyright 2023,InfluxData4Apache Iceberg,an open-source data table format,revolutionizes data management by addressing traditional catalog inefficiencies and enhancing query performance and storage costs.It suppo

3、rts ACID transactions,time travel,and SQL-like operations,integrating seamlessly with frameworks like Apache Spark and Apache Flink,making it ideal for large-scale data lakes.|Copyright 2023,InfluxData5What Iceberg is and is not Table Format specification APIs and libraries for interaction with that

4、 specification A storage engine An Execution Engine(for Query/Compute)A service|Copyright 2023,InfluxData6When Iceberg is not the right fitSmall datasetsUsing Iceberg for a small dataset that doesnt necessitate a data lake might be excessive.Real-time data ingestionOut of the box,Apache Iceberg does

5、 not support real-time data injection due to its reliance on batch processing.|Copyright 2023,InfluxData7|Copyright 2023,InfluxData8Why it was built8|Copyright 2023,InfluxData9|Copyright 2023,InfluxData9|Copyright 2023,InfluxData10Case Study-Netflix-Atlas Performance Hive table-with Parquet filters:

6、400k+splits per day,not combinedExplain Query:9.6 minutes(planning time)Iceberg table-partition data filtering:15,218 splits,combined13 min(wall time)/10 sec(planning)Iceberg table-partition and min/max filtering:412 splits42 sec(wall time)/25 sec(planning)|Copyright 2023,InfluxData11How it works11|

word格式文档无特别注明外均可编辑修改,预览文件经过压缩,下载原文更清晰!
三个皮匠报告文库所有资源均是客户上传分享,仅供网友学习交流,未经上传用户书面授权,请勿作商用。
本文主要介绍了Apache Iceberg,一种开源的数据表格式,它通过解决传统目录的低效性和提高查询性能和存储成本来革新数据管理。Iceberg支持ACID事务、时间旅行和SQL样操作,可以与Apache Spark和Apache Flink等框架无缝集成,非常适合大规模数据湖。文章详细介绍了Iceberg的工作原理、关键优势、迁移和集成方法、用例以及InfluxDB如何加入Iceberg生态系统。文中还提到了Iceberg Summit和Apache X-table等资源。
为什么Apache Iceberg适合大规模数据湖? Iceberg如何通过ACID事务支持多用户环境? 如何利用Iceberg进行数据迁移和集成?
客服
商务合作
小程序
服务号
折叠