通过探索Apache Iceberg的力量来进行数据协调.pptx-三个皮匠报告

1、|Copyright 2023,InfluxData1Navigating Data Harmony by Exploring the Power of Apache IcebergZoe Steinkamp|Copyright 2023,InfluxData2Agenda Introduction to Apache Iceberg Why it was built+How it works Key Benefits of Apache Iceberg Migration+Integrations Use Cases Why InfluxDB is using Iceberg Resourc

2、es|Copyright 2023,InfluxData3Introduction to Apache Iceberg3|Copyright 2023,InfluxData4|Copyright 2023,InfluxData4Apache Iceberg,an open-source data table format,revolutionizes data management by addressing traditional catalog inefficiencies and enhancing query performance and storage costs.It suppo

3、rts ACID transactions,time travel,and SQL-like operations,integrating seamlessly with frameworks like Apache Spark and Apache Flink,making it ideal for large-scale data lakes.|Copyright 2023,InfluxData5What Iceberg is and is not Table Format specification APIs and libraries for interaction with that

4、 specification A storage engine An Execution Engine(for Query/Compute)A service|Copyright 2023,InfluxData6When Iceberg is not the right fitSmall datasetsUsing Iceberg for a small dataset that doesnt necessitate a data lake might be excessive.Real-time data ingestionOut of the box,Apache Iceberg does

5、 not support real-time data injection due to its reliance on batch processing.|Copyright 2023,InfluxData7|Copyright 2023,InfluxData8Why it was built8|Copyright 2023,InfluxData9|Copyright 2023,InfluxData9|Copyright 2023,InfluxData10Case Study-Netflix-Atlas Performance Hive table-with Parquet filters:

6、400k+splits per day,not combinedExplain Query:9.6 minutes(planning time)Iceberg table-partition data filtering:15,218 splits,combined13 min(wall time)/10 sec(planning)Iceberg table-partition and min/max filtering:412 splits42 sec(wall time)/25 sec(planning)|Copyright 2023,InfluxData11How it works11|

通过探索Apache Iceberg的力量来进行数据协调.pptx

相关报告