《DeepSeek:2025 DeepSeekMath-V2技术报告(英文版)(19页).pdf》由会员分享,可在线阅读,更多相关《DeepSeek:2025 DeepSeekMath-V2技术报告(英文版)(19页).pdf(19页珍藏版)》请在三个皮匠报告上搜索。
1、DeepSeekMath-V2:Towards Self-Verifiable Mathematical ReasoningZhihong Shao*,Yuxiang Luo*,Chengda Lu*,Z.Z.Ren*Jiewen Hu,Tian Ye,Zhibin Gou,Shirong Ma,Xiaokang ZhangDeepSeek-AIhttps:/ language models have made significant progress in mathematical reasoning,which servesas an important testbed for AI an
2、d could impact scientific research if further advanced.Byscaling reasoning with reinforcement learning that rewards correct final answers,LLMs haveimproved from poor performance to saturating quantitative reasoning competitions like AIMEand HMMT in one year.However,this approach faces fundamental li
3、mitations.Pursuing higherfinal answer accuracy doesnt address a key issue:correct answers dont guarantee correctreasoning.Moreover,many mathematical tasks like theorem proving require rigorous step-by-step derivation rather than numerical answers,making final answer rewards inapplicable.Topush the l
4、imits of deep reasoning,we believe it is necessary to verify the comprehensivenessand rigor of mathematical reasoning.Self-verification is particularly important for scaling test-time compute,especially for open problems without known solutions.Towards self-verifiablemathematical reasoning,we invest
5、igate how to train an accurate and faithful LLM-based verifierfor theorem proving.We then train a proof generator using the verifier as the reward model,and incentivize the generator to identify and resolve as many issues as possible in their ownproofs before finalizing them.To maintain the generati
6、on-verification gap as the generatorbecomes stronger,we propose to scale verification compute to automatically label new hard-to-verify proofs,creating training data to further improve the verifier.Our resulting model,DeepSeekMath-V2,demonstrates strong theorem-proving capabilities,achieving gold-le