1、RISC-V Summit China 2024Enabling Hardware Sampling Based PGO for RISC-V PlatformGao Yichuan()RISC-V Agile Design LabIntel Labs ChinaIntel ConfidentialDepartment or Event Name2RISC-V Summit China 20242PGO BasicsPGO(Profile-Guided Optimization)Use runtime feedback to improve software performanceSWPGO(
2、Instrumentation)Insert profiling code snippet to software Profiled data:Branch/Jump count/target Register/Memory values No hardware requirement High profiling overheadHWPGO(Sampling)Use hardware sample collect methods Profiled data:Performance Counter values Branch/Jump count/target(with LBR/CTR)Reg
3、ister/Memory values(not yet)Require PMU hardware support Low profiling overheadThis talkProfilingRun(Train)OptimizeSoftware vs Hardware PGOIntel ConfidentialDepartment or Event Name3RISC-V Summit China 20243RISC-V PMUPMU(Performance Monitoring Unit)Hardware unit for counting occurrence of uArch even
4、tsRISC-V PMU:provide fixed and event-based hpmcounters Software access via CSR interfaceIntel ConfidentialDepartment or Event Name4RISC-V Summit China 20244RISC-V PMU Extensions for HWPGO Latest PMU extensions Smcdeleg&SsccfgPMU counter delegation Smcntrpmf&SscofpmfCounter mode filtering and overflo
5、w interrupt Smctr&SsctrHart control transfer records(in next slide)Future extensions for precise event sampling,register value profiling,etc.Event-based samplingPMU config in S-modeControl flow info recording Base extensions(widely adopted)Zicntr&ZihpmProvide basic counters for PMU eventsRVA23Intel
6、ConfidentialDepartment or Event Name5RISC-V Summit China 20245RISC-V CTR for HWPGO Record control transfer history Jump instructions(including function calls and returns)Taken/not-taken branch instructions Traps,and trap returns Data is organized as circular buffer(FIFO)Source PC,Target PC Type,Cycl