Aggregated DeepSeek V4 specs: ~1T MoE, Engram memory, 1M+ context

Third-party explainer sites deepseekv4.dev and deepseek-v4.ai aggregate the leaked spec sheet: ~1T-parameter MoE, Engram memory core, mHC hyper-connections reasoning, and a 1M+ token context window.

Rumored spec sheet

Pulled from the third-party explainer pages at deepseekv4.dev and deepseek-v4.ai:

~1 Trillion parameters in a Mixture-of-Experts (MoE) layout.
Engram Memory Core — described as "conditional O(1) memory lookup" for persistent recall.
mHC Hyper-Connections Reasoning — a new information-flow primitive aimed at multi-step reasoning.
1M+ token context window — long-context positioning for repo-level code and long-document work.
Efficiency: "around 40% lower memory use and up to 1.8x faster inference" vs earlier architectures.

Multimodal & workflow positioning

From deepseekv4.dev:

"Understand words, visuals, and voice together." "Designed to hold much larger context so long documents stay connected." "Turn repeatable tasks into reliable workflows."

Positioned for repo-level coding, long-context reasoning, and agentic workflows, with an SDK + sandbox integration path and planned enterprise governance controls.

Leaked benchmarks (unverified)

| Benchmark | Claimed score | |-----------|---------------| | SWE-Bench Verified | 83.7% | | HumanEval | ~90-92% | | AIME 2026 | 99.4% |

Pricing rumor

"$0.01 - $0.14 / 1M tokens" — expected API pricing band (unverified).

Status

Not officially released as of mid-April 2026; latest official line is DeepSeek-V3.2.
Multiple community reports still point to a near-term launch window.
All figures above are aggregator-sourced and remain unverified until DeepSeek ships the technical report and weights.

Read alongside the earlier roundup (2M context, dynamic sparse routing, Ascend training) for the full picture.