1、李元佳云徙科技联合创始人及CTO从Uber切换Potgres说起事件的背景事件的背景事件的背景Postgres served us well in the early days of Uber,but we ran into significant problems scaling Postgres with our growth.Today,we have some legacy Postgres instances,but the bulk of our databases are either built on top of MySQL(typically using our Schem
2、aless layer)or,in some specialized cases,NoSQL databases like Cassandra.事件的背景Postgres数据的存储Postgres的数据更新记录的多版本机制及更新写的路径比较长中途需要读旧版本的话,代价比较大旧版本回收和管理问题比较大Postgres索引与数据的关系(B-Tree+Heap)MySQL索引与数据的关系(Clustered Index)索引结构导致的差异二级索引的检索需要进行两次索引检索如果主索引的数据量大的话,比较消耗空间记录物理位置的变更,会导致所有索引的变更Uber宣称的写放大问题-表结构表结构主索引二级索引
3、二级索引Uber宣称的写放大问题(一次更新、四次写)写数据 Write the new row tuple to the tablespace更新主索引 Update the primary key index to add a record for the new tuple更新二级索引 Update the(first,last)index to add a record for the new tuple更新二级索引 Update the birth_year index to add a record for the new tupleUber宣称的写放大问题数据的更新需要更新所有索引
4、 PostgreSQL always needs to update all indexes on a table when updating rows in the table.MySQL with InnoDB,on the other hand,needs to update only those indexes that contain updated columns.“if we have a table with a dozen indexes defined on it,an update to a field that is only covered by a single i
5、ndex must be propagated into all 12 indexes to reflect the ctid for the new row”.Postgres的免索引更新机制(HOT更新)Postgres的流复制Uber宣称的Postgres的流复制的问题写放大:This write amplification issue naturally translates into the replication layer as well because replication occurs at the level of on-disk changes.物理复制带来的潜在数据损
6、坏的危险:During a routine master database promotion to increase database capacity,we ran into a Postgres 9.2 bug.版本的升级问题:Because replication records work at the physical level,its not possible to replicate data between different general availab