报告预览

DeepSeek R1技术报告（英文版）（22页）.pdf

编号：599099

PDF 中文版 22页 1.32MB 下载积分：VIP专享

下载报告请您先登录！

DeepSeek R1技术报告（英文版）（22页）.pdf

1、DeepSeek-R1:Incentivizing Reasoning Capability in LLMs viaReinforcement LearningDeepSeek-AIAbstractWe introduce our first-generation reasoning models,DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero,a model trained via large-scale reinforcement learning(RL)without super-vised fine-tuning(SFT)as a p

2、reliminary step,demonstrates remarkable reasoning capabilities.Through RL,DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors.However,it encounters challenges such as poor readability,and languagemixing.To address these issues and further enhance reasoning per

3、formance,we introduceDeepSeek-R1,which incorporates multi-stage training and cold-start data before RL.DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.To support theresearch community,we open-source DeepSeek-R1-Zero,DeepSeek-R1,and six dense models(1.5B,7B,8B,14B,32B

4、,70B)distilled from DeepSeek-R1 based on Qwen and Llama.AIME 2024(Pass1)Codeforces(Percentile)GPQA Diamond(Pass1)MATH-500(Pass1)MMLU(Pass1)SWE-bench Verified(Resolved)020406080100Accuracy/Percentile(%)79.896.371.597.390.849.279.296.675.796.491.848.972.690.662.194.387.436.863.693.460.090.085.241.639.

5、258.759.190.288.542.0DeepSeek-R1OpenAI-o1-1217DeepSeek-R1-32BOpenAI-o1-miniDeepSeek-V3Figure 1|Benchmark performance of DeepSeek-R1.Contents1Introduction31.1Contributions.41.2Summary of Evaluation Results.42Approach52.1Overview.52.2DeepSeek-R1-Zero:Reinforcement Learning on the Base Model.52.2.1Rein

6、forcement Learning Algorithm.52.2.2Reward Modeling.62.2.3Training Template.62.2.4Performance,Self-evolution Process and Aha Moment of DeepSeek-R1-Zero62.3DeepSeek-R1:Reinforcement Learning with Cold Start.92.3.1Cold Start.92.3.2Reasoning-oriented Reinforcement Learning.102.3.3Rejection Sampling and

友情提示

1、下载报告失败解决办法
2、PDF文件下载后，可能会被浏览器默认打开，此种情况可以点击浏览器菜单，保存网页到桌面，就可以正常下载了。
3、本站不支持迅雷下载，请使用电脑自带的IE浏览器，或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩，下载后原文更清晰。

本文（DeepSeek R1技术报告（英文版）（22页）.pdf）为本站（Kelly Street）主动上传，三个皮匠报告文库仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对上载内容本身不做任何修改或编辑。若此文所含内容侵犯了您的版权或隐私，请立即通知三个皮匠报告文库（点击联系客服），我们立即给予删除！

温馨提示：如果因为网速或其他原因下载失败请重新下载，重复下载不扣分。