《朱霜-rust-china-conf-2023.pdf》由会员分享,可在线阅读,更多相关《朱霜-rust-china-conf-2023.pdf(26页珍藏版)》请在三个皮匠报告上搜索。
1、Build a lightweight logging and tracing tool with Apache Arrow,Parquet and DataFusion 朱霜 2023.06.181.Introduction2.Duo-Observability duet:Logging and Tracing What is Duo?How does it work?3.Apache Arrow,Parquet and DataFusion A brief introduction to Arrow,Parquet,and DataFusion How does Duo store and
2、 query log,span data?4.The vision of DuoContentIntroductionID:Folyd GitHub:folyd 博客:https:/ 作:字节跳动(引擎)Duo-Observability duet:Logging and Tracinghttps:/ and Tracing Stackpowerful but complexLess powerful,Just enoughNo storage backend dependenciesOne single commandEasy client integrationSuit for local
3、 developmentEasy to install and upgradeDuo-How to reduce complexity?Duo-Observability duet:Logging and TracingDuo-Observability duet:Logging and TracingDuo-Observability duet:Logging and TracingDuo-Observability duet:Logging and TracingDuo-Observability duet:Logging and TracingDuo-How does it work?D
4、uo-How does it work?Apache Arrow,Parquet,and DataFusionApache Arrow Created by Wes McKinney,creator of Pandas(2016)A language-independent columnar memory format Supports zero-copy reads for lightning-fast data access without serialization overhead Single instruction/multiple data(SIMD),vectorized pr
5、ocessing,and vectorized querying Adopt by OLAP and data warehouse systems Apache ArrowApache Arrow Field Array Schema RecordBatchApache Arrow Computation kernels on Arrow Arrays Apache ParquetFree and open source file formatLanguage agnosticColumn-based formatUsed for analytics(OLAP)use casesHighly
6、efficientdata compression and decompressionSupports complex data typesand advanced nested data structuresQuerying Parquet with Millisecond Latency https:/ logslogs logs1.parquet logs2.parquet logs3.parquet0 directories,3 files$du-h logs1.4G