Updated 2026-06-28
DeepSeek V4 pricing per million tokens: read the official contract before estimating spend
A lot of DeepSeek pricing answers online flatten everything into one vague number. The official DeepSeek Models & Pricing page is more specific than that. It prices the API in units of per 1M tokens, splits input billing into cache hit and cache miss rates, and keeps Flash and Pro on different cost curves. That makes this page useful for a narrow but valuable search intent: developers and buyers who need a current DeepSeek-first explanation of what the official billing contract actually says before they choose a model or estimate a budget.
1. What DeepSeek officially charges today
DeepSeek's official Models & Pricing page currently lists prices in units of per 1M tokens. That matters because smaller example prompts can hide the real production math if you do not normalize usage at the same unit.
On the current official page, `deepseek-v4-flash` is priced at `$0.0028` per 1M input tokens when the cache hits, `$0.14` per 1M input tokens when the cache misses, and `$0.28` per 1M output tokens. `deepseek-v4-pro` is currently listed at `$0.003625`, `$0.435`, and `$0.87` for the same buckets.
Those numbers make the first practical point obvious: Flash is not just a little cheaper than Pro. It is materially cheaper, especially on cache misses and output-heavy workloads.
Sources checked
- DeepSeek official Models & Pricing page - Primary source for the current per-1M token pricing table, context length, and output cap.
2. Cache hit versus cache miss is not a cosmetic line item
The official DeepSeek page separates input pricing into cache hit and cache miss because repeated context can bill differently from fresh context. If your workload reuses long prompts, system instructions, or prior context effectively, the cost profile changes a lot.
That means the right buyer question is not only 'Flash or Pro'. It is also 'How often does my input qualify as a cache hit'. A team that ignores that split can overestimate or underestimate DeepSeek spend by a wide margin.
| Model | 1M input cache hit | 1M input cache miss | 1M output |
|---|---|---|---|
| deepseek-v4-flash | $0.0028 | $0.14 | $0.28 |
| deepseek-v4-pro | $0.003625 | $0.435 | $0.87 |
3. A simple way to think about Flash versus Pro
Use Flash as the financial default when the workload is broad support chat, repeated structured transforms, agent side tasks, or other traffic where throughput matters more than squeezing out the strongest reasoning on every call.
Use Pro when the prompts are harder, the reasoning path matters, or the downstream cost of a weaker answer is higher than the API savings. The official page does not tell you which one to buy. It gives you the contract so you can choose intentionally.
The clean DeepSeek-first rule is simple: start from the user problem, then map that problem onto the official Pro or Flash pricing curve instead of treating the most expensive model as the default out of habit.
4. Pricing is only part of the contract
The same official page also shows a shared 1M context length and a maximum output of 384K for both `deepseek-v4-flash` and `deepseek-v4-pro`. That matters because model selection is really a three-part decision: price, capability, and traffic shape.
If one model were cheaper but had a radically smaller context or output ceiling, the tradeoff would look different. The current official table says the main split is economic and operational rather than a smaller context window on Flash.
5. Do not confuse site pricing cards with the official model contract
This site's `/pricing` page is for in-stock DeepSeek Coding Plans only. It is not a mirror of every possible official DeepSeek model or every future contract DeepSeek might document.
That distinction matters. The official docs tell you what DeepSeek itself currently charges and supports. The local `/pricing` page only reflects what inventory is actually available for resale on this site.
6. A buyer checklist before you commit to a model
Answer six questions before standardizing on a DeepSeek model: how many tokens you send in, how many you expect back, how often cached context repeats, whether the workload is output-heavy, how much concurrency you need, and whether weaker reasoning would create downstream support cost.
If you need the throughput side of that decision, continue with `/guides/deepseek-v4-concurrency-limits`. If you need broader endpoint selection context, compare this page with `/guides/deepseek-openai-vs-anthropic-api-routing` and `/guides/deepseek-v4-pro-vs-flash`.
FAQ
How does DeepSeek officially price V4 usage?
The current official DeepSeek Models & Pricing page bills in units of per 1M tokens and splits input pricing into cache hit and cache miss rates, with separate output pricing for Flash and Pro.
Is DeepSeek V4 Flash cheaper than DeepSeek V4 Pro?
Yes. On the current official pricing page, Flash is materially cheaper than Pro across cache-hit input, cache-miss input, and output billing.
Why does DeepSeek separate cache hit and cache miss pricing?
Because repeated cached input can be billed differently from fresh uncached input, which changes the real cost profile for long or repetitive workloads.
Does this page mean both models are always sold on the pricing page?
No. This page explains the official DeepSeek model contract. The local `/pricing` page still lists only the in-stock DeepSeek Coding Plans actually available on this site.
What should I read next if my real problem is throughput, not price?
Read `/guides/deepseek-v4-concurrency-limits` next, because concurrency limits and 429 behavior affect production traffic even when the per-token price looks acceptable.
The practical pricing rule is simple: read the official per-1M token contract, separate cache hit from cache miss, and choose Flash or Pro based on workload shape instead of treating model selection as a branding decision.
Related model comparisons
Continue from this guide into structured DeepSeek-first comparison pages with model tables, routing advice, and pricing context.