Aggregated DeepSeek V4 specs: ~1T MoE, Engram memory, 1M+ context
Third-party explainer sites deepseekv4.dev and deepseek-v4.ai aggregate the leaked spec sheet: ~1T-parameter MoE, Engram memory core, mHC hyper-connections reasoning, and a 1M+ token context window.
Rumored spec sheet
Pulled from the third-party explainer pages at deepseekv4.dev and deepseek-v4.ai:
- ~1 Trillion parameters in a Mixture-of-Experts (MoE) layout.
- Engram Memory Core — described as "conditional O(1) memory lookup" for persistent recall.
- mHC Hyper-Connections Reasoning — a new information-flow primitive aimed at multi-step reasoning.
- 1M+ token context window — long-context positioning for repo-level code and long-document work.
- Efficiency: "around 40% lower memory use and up to 1.8x faster inference" vs earlier architectures.
Multimodal & workflow positioning
From deepseekv4.dev:
"Understand words, visuals, and voice together." "Designed to hold much larger context so long documents stay connected." "Turn repeatable tasks into reliable workflows."
Positioned for repo-level coding, long-context reasoning, and agentic workflows, with an SDK + sandbox integration path and planned enterprise governance controls.
Leaked benchmarks (unverified)
| Benchmark | Claimed score | |-----------|---------------| | SWE-Bench Verified | 83.7% | | HumanEval | ~90-92% | | AIME 2026 | 99.4% |
Pricing rumor
"$0.01 - $0.14 / 1M tokens" — expected API pricing band (unverified).
Status
- Not officially released as of mid-April 2026; latest official line is DeepSeek-V3.2.
- Multiple community reports still point to a near-term launch window.
- All figures above are aggregator-sourced and remain unverified until DeepSeek ships the technical report and weights.
Read alongside the earlier roundup (2M context, dynamic sparse routing, Ascend training) for the full picture.