DeepSeek R1技术报告(英文版)(22页).pdf

编号:599099 PDF  中文版 22页 1.32MB 下载积分:VIP专享
下载报告请您先登录!

DeepSeek R1技术报告(英文版)(22页).pdf

1、DeepSeek-R1:Incentivizing Reasoning Capability in LLMs viaReinforcement LearningDeepSeek-AIAbstractWe introduce our first-generation reasoning models,DeepSeek-R1-Zero and DeepSeek-R1.DeepSeek-R1-Zero,a model trained via large-scale reinforcement learning(RL)without super-vised fine-tuning(SFT)as a p

2、reliminary step,demonstrates remarkable reasoning capabilities.Through RL,DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguingreasoning behaviors.However,it encounters challenges such as poor readability,and languagemixing.To address these issues and further enhance reasoning per

3、formance,we introduceDeepSeek-R1,which incorporates multi-stage training and cold-start data before RL.DeepSeek-R1 achieves performance comparable to OpenAI-o1-1217 on reasoning tasks.To support theresearch community,we open-source DeepSeek-R1-Zero,DeepSeek-R1,and six dense models(1.5B,7B,8B,14B,32B

4、,70B)distilled from DeepSeek-R1 based on Qwen and Llama.AIME 2024(Pass1)Codeforces(Percentile)GPQA Diamond(Pass1)MATH-500(Pass1)MMLU(Pass1)SWE-bench Verified(Resolved)020406080100Accuracy/Percentile(%)79.896.371.597.390.849.279.296.675.796.491.848.972.690.662.194.387.436.863.693.460.090.085.241.639.

5、258.759.190.288.542.0DeepSeek-R1OpenAI-o1-1217DeepSeek-R1-32BOpenAI-o1-miniDeepSeek-V3Figure 1|Benchmark performance of DeepSeek-R1.Contents1Introduction31.1Contributions.41.2Summary of Evaluation Results.42Approach52.1Overview.52.2DeepSeek-R1-Zero:Reinforcement Learning on the Base Model.52.2.1Rein

6、forcement Learning Algorithm.52.2.2Reward Modeling.62.2.3Training Template.62.2.4Performance,Self-evolution Process and Aha Moment of DeepSeek-R1-Zero62.3DeepSeek-R1:Reinforcement Learning with Cold Start.92.3.1Cold Start.92.3.2Reasoning-oriented Reinforcement Learning.102.3.3Rejection Sampling and

友情提示

1、下载报告失败解决办法
2、PDF文件下载后,可能会被浏览器默认打开,此种情况可以点击浏览器菜单,保存网页到桌面,就可以正常下载了。
3、本站不支持迅雷下载,请使用电脑自带的IE浏览器,或者360浏览器、谷歌浏览器下载即可。
4、本站报告下载后的文档和图纸-无水印,预览文档经过压缩,下载后原文更清晰。

本文(DeepSeek R1技术报告(英文版)(22页).pdf)为本站 (Kelly Street) 主动上传,三个皮匠报告文库仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。 若此文所含内容侵犯了您的版权或隐私,请立即通知三个皮匠报告文库(点击联系客服),我们立即给予删除!

温馨提示:如果因为网速或其他原因下载失败请重新下载,重复下载不扣分。
客服
商务合作
小程序
服务号
折叠