DeepSeek V4 发布信息汇总：价格、基准、指南与 Coding Plan 边界

汇总 DeepSeek V4 的发布定位、基准测试、迁移指南和本站有库存 Coding Plan 的购买边界。

中文摘要

汇总 DeepSeek V4 的发布定位、基准测试、迁移指南和本站有库存 Coding Plan 的购买边界。

这篇中文稿保留原始来源链接，并把 DeepSeek 官方发布、报道和市场传闻分开标注。购买相关判断仍以 /zh/pricing 的真实库存卡片为准；出现在新闻或基准中的模型不代表可购买。

2M-token context window: roughly 16x larger than V3's 128K. Leaked needle-in-a-haystack charts reportedly hold 97%+ recall at 1.8M depth, which the community reads as native long-sequence training rather than RoPE extrapolation.
Dynamic sparse architecture: routing is claimed to be per-token AND per-layer, with ~30B active params out of a 600B+ total. Alleged internal profiling shows expert-utilization variance dropping ~40% vs V3.
Trained on Huawei Ascend: multiple cross-referenced posts say V4 was pre-trained entirely on Ascend 910C clusters (10k+ NPUs) using MindSpore plus an in-house collective-comms library — zero NVIDIA in the training path.

@teortaxesTex (X): "V4 isn't just bigger MoE. Routing is dynamic per token AND per layer — think MoE x MoD hybrid."

@dylan522p (X): "First frontier-class model trained 100% on domestic silicon — if true, this is the actual decoupling moment."

u/llama_maxxer (r/LocalLLaMA, up 842): "If 2M is real, the entire RAG category has to be rewritten this year."

simonw (HN, up 612): "If the weights drop with a real 2M context that isn't a lobotomy past 200K, this is a bigger deal than V3 was."

[ ] Official technical report PDF
[ ] HuggingFace weights published
[ ] Third-party long-context benchmark
[ ] Ascend training MFU numbers

Consensus release window is late April. If all four land within two weeks, V4 becomes the first end-to-end domestic frontier model event of 2026.