1、Dealing with Big Data and moving towards AI处理大数据,迈向人工智能Alexander Zevaykin,PhDGroup Leader at Yandex Infrastructureydb.tech/zhYandex consists of over 90 services,used by millions of people dailyYandex由90多个服务组成,每天有数百万人使用Yandex builds a lot of its infrastructure in-house Information searchComputer Visi
2、onNeural language models(GPT)Simultaneous translation of AI-based videosSelf-drivingvehiclesCloud technologiesSpeech technologiesCrowdsourcingRouting and navigation technologiesWeather forecasting technology Meteum 2.025700+employeesYandex公司在内部建立了很多基础设施Part1YDB:dealing with Big Data处理大数据6Horizontal
3、scaling横向扩展性ACID transactions in multiple AZ分布式环境保持ACID事务Operability and automatic recovery in case of failures故障时可操作性和自动恢复Scaling by millions of transactions per second and petabytes of data每秒可扩展数百万个事务和PB级数据Open-Source with Apache 2.0 license开源What is YDB?Distributed SQL database for operational an
4、d analytical workloadsYDB是一个开源、分布式、高容错的 SQL 数据库系统,能将高可用性、可扩展性与强一致性和ACID事务相结合它可以同时处理事务性(OLTP)、分析性(OLAP)和流式工作负载 in Yandex2014201420172017Base for Yandex CloudFirst commit202220222024202435000+nodes5000+databases 70+PB storageOpen-Source YDB诞生于Yandex-俄罗斯最大的IT公司,我们已有十年发展历史。8Shared Nothing 我们的基于无共享的架构 Cl
5、uster of bare metal or virtual machines Shared nothing architecture ommodity hardware Cluster both stores the data and process user queriesCompute Storage separation计算和存储节点独立管理 Scalability Cost-efficiency FlexibilityCompute and storage nodes are managed independentlyCompute nodesTabletTabletTabletTa
6、bletTabletTabletTabletTabletTabletTabletTabletTabletStorage nodesTable Partitions Autosplit and Balancing数据表自动拆分,自动平衡 Split by load Split by size YDB evenly distributes table partitions among the nodesMirror-3-dc3 3 availability zones3 3storage factorcopes with the loss of one AZ+one server rack in