DeepSeek V4 launch roundup: 2M context, dynamic sparse, Ascend-trained
Cross-platform chatter on Twitter, Reddit and Hacker News converges on three claims about DeepSeek V4: a 2,000,000-token context window, a dynamic sparse MoE architecture, and end-to-end pre-training on Huawei Ascend.
Three headline claims
- 2M-token context window: roughly 16x larger than V3's 128K. Leaked needle-in-a-haystack charts reportedly hold 97%+ recall at 1.8M depth, which the community reads as native long-sequence training rather than RoPE extrapolation.
- Dynamic sparse architecture: routing is claimed to be per-token AND per-layer, with ~30B active params out of a 600B+ total. Alleged internal profiling shows expert-utilization variance dropping ~40% vs V3.
- Trained on Huawei Ascend: multiple cross-referenced posts say V4 was pre-trained entirely on Ascend 910C clusters (10k+ NPUs) using MindSpore plus an in-house collective-comms library — zero NVIDIA in the training path.
Community reactions
@teortaxesTex (X): "V4 isn't just bigger MoE. Routing is dynamic per token AND per layer — think MoE x MoD hybrid."
@dylan522p (X): "First frontier-class model trained 100% on domestic silicon — if true, this is the actual decoupling moment."
u/llama_maxxer (r/LocalLLaMA, up 842): "If 2M is real, the entire RAG category has to be rewritten this year."
simonw (HN, up 612): "If the weights drop with a real 2M context that isn't a lobotomy past 200K, this is a bigger deal than V3 was."
What still needs to be verified
[ ] Official technical report PDF
[ ] HuggingFace weights published
[ ] Third-party long-context benchmark
[ ] Ascend training MFU numbers
Consensus release window is late April. If all four land within two weeks, V4 becomes the first end-to-end domestic frontier model event of 2026.