DeepSeek：2026 Deepseek v4 技术报告（英文原版+译版）（58页）.pdf

DeepSeek：2026 Deepseek v4 技术报告（英文版）（58页）.pdf

《DeepSeek：2026 Deepseek v4 技术报告（英文版）（58页）.pdf》由会员分享，可在线阅读，更多相关《DeepSeek：2026 Deepseek v4 技术报告（英文版）（58页）.pdf（58页珍藏版）》请在三个皮匠报告上搜索。

1、DeepSeek-V4:Towards Highly Efficient Million-Token Context IntelligenceDeepSeek-AIAbstractWe present a preview version of DeepSeek-V4 series,including two strong Mixture-of-Experts(MoE)language models DeepSeek-V4-Pro with 1.6T parameters(49B activated)andDeepSeek-V4-Flash with 284B parameters(13B ac

2、tivated)both supporting a context length ofone million tokens.DeepSeek-V4 series incorporate several key upgrades in architecture and op-timization:(1)a hybrid attention architecture that combines Compressed Sparse Attention(CSA)and Heavily Compressed Attention(HCA)to improve long-context efficiency

3、;(2)Manifold-Constrained Hyper-Connections(mHC)that enhance conventional residual connections;(3)and the Muon optimizer for faster convergence and greater training stability.We pre-trainboth models on more than 32T diverse and high-quality tokens,followed by a comprehensivepost-training pipeline tha

4、t unlocks and further enhances their capabilities.DeepSeek-V4-Pro-Max,the maximum reasoning effort mode of DeepSeek-V4-Pro,redefines the state-of-the-art foropen models,outperforming its predecessors in core tasks.Meanwhile,DeepSeek-V4 series arehighly efficient in long-context scenarios.In the one-

5、million-token context setting,DeepSeek-V4-Pro requires only 27%of single-token inference FLOPs and 10%of KV cache comparedwith DeepSeek-V3.2.This enables us to routinely support one-million-token contexts,therebymaking long-horizon tasks and further test-time scaling more feasible.The model checkpoi

6、ntsare available athttps:/huggingface.co/collections/deepseek-ai/deepseek-v4.SimpleQAVerified(Pass1)HLE(Pass1)ApexShortlist(Pass1)Codeforces(Rating)SWEVerified(Resolved)TerminalBench 2.0(Acc)Toolathlon(Pass1)020406080100Accuracy/Pass1(%)57.946.245.375.637.740.039.844.490.285.978.189.132063168305280.