1、PCP Size Auto-tuning forPage Allocator ScalabilityHuang,Ying2024 MarchAgenda Problems and background Design and implementation Performance evaluation ConclusionProblems and BackgroundBuddy SystemLCPULCPULCPULCPULCPULCPUNodePCPPCPBuddyPCPPCPPCPPCPAlloc/free One buddy per zone(node)protected by one zo
2、ne lock-Scalability issue!One PCP(Per-CPU Pageset)per LCPU(logical CPU)Buddy System-Continued Physical memory management:node-zone One buddy(page allocator)per zone(per node in practice)All logical CPU of one NUMA node share one zone lock-Scalability issue!More and more cores in one NUMA node in the
3、 future PCP(Per-CPU pageset)can reduce zone lock contention Batching allocation/freeing Less allocation/freeing in zonePossible Solution 1:Fake NUMA NodeLCPULCPULCPULCPULCPULCPUNodeBuddy 0Buddy 1Buddy 2Fake node 0PreferFake node 1Fake Node 2 Very good scalability:Zone,reclaim,compaction,etc.Easy to
4、implement More management burdenPossible Solution 2:Splitting BuddyLCPULCPULCPULCPULCPULCPUNodeBuddy0Buddy1Buddy2ZonePrefer Refused by community for now https:/lore.kernel.org/linux-mm/20230511065607.37407-1- be revisited in the future if necessaryRegion0Region1Region2Possible Solution 3:Larger Allo
5、cation Unit Large folios:Smaller cache footprint with zone lock held0.005.0010.0015.0020.0025.0030.0035.0040.0045.000102030405060Will-it-scale page allocate/free throughput(GB/s)on ICX-SPorder=2order=0Possible Solution 4:PCP Auto-tuningLCPULCPULCPULCPULCPULCPUNodePCPPCPBuddyPCPPCPPCPPCPAlloc/free La
6、rger PCP in effect:less allocation/freeing in buddy Auto-tune:as large as requiredAutoAutoAutoAutoAutoAllocation Patterns-Kbuild Pattern 1:amplitude 100,period 1s Original PCP Pattern 2:amplitude 25k,period:0.5s-1s Auto-tuned PCP high Pattern 3:amplitude=100k,period=10s Not covered by PCPDesign and