Updated 2026-07-01

DeepSeek reasoning model guide: remove `reasoning_content` from replayed inputs and stop tweaking knobs that do nothing

DeepSeek's official reasoning-model guide is more specific than many wrapper tutorials. It says `max_tokens` covers both the chain-of-thought segment and the final answer, it names which familiar OpenAI-style parameters have no effect or hard-fail, and it warns that sending `reasoning_content` back in later input messages will trigger a 400 error. That makes this a strong support-query page for teams debugging DeepSeek reasoning flows rather than a generic 'thinking model' explainer.

1. `max_tokens` includes the hidden reasoning budget

DeepSeek's official reasoning-model page says `max_tokens` is the maximum output length including the CoT portion. The docs give a default of 32K and a maximum of 64K.

That is operationally different from treating the visible answer as the whole budget. If a reasoning-heavy task burns tokens in the CoT path first, the final answer can be shorter than a team expects even when the request looked generous on paper.

If the main production issue is cost rather than reasoning budget, pair this page with `/guides/deepseek-v4-pricing-per-million-tokens` and `/guides/deepseek-error-codes-guide`.

Sources checked

DeepSeek official reasoning-model guide - Primary source for the reasoning-model token budget and replay rules.

2. Some familiar sampling parameters do nothing here

DeepSeek explicitly lists `temperature`, `top_p`, `presence_penalty`, and `frequency_penalty` as unsupported for the reasoning model. The compatibility rule is subtle: setting those fields does not throw an error, but it also has no effect.

That matters because many teams keep tweaking those knobs while debugging output quality, then misread a no-op change as proof that the model is unstable. On the official contract, those fields are not the live control surface for this reasoning route.

The same page is stricter about `logprobs` and `top_logprobs`: DeepSeek says those settings do trigger an error.

DeepSeek reasoning-model parameter support boundary
Parameter	Official behavior	What to do
`temperature` / `top_p`	Accepted for compatibility but no effect	Remove them from reasoning templates
`presence_penalty` / `frequency_penalty`	Accepted for compatibility but no effect	Do not treat them as tuning levers
`logprobs` / `top_logprobs`	Rejected with an error	Disable them on reasoning routes
`max_tokens`	Counts CoT + final answer	Budget for reasoning, not just visible output

3. Never send `reasoning_content` back in the next request

DeepSeek's official page is direct here: if the `reasoning_content` field appears in the sequence of input messages, the API returns a 400 error.

That means the replay rule is not 'store everything and send it back.' The safe pattern is narrower: keep the assistant's final `content` in history, but strip out `reasoning_content` before the next API request.

This is one of the easiest DeepSeek reasoning bugs to self-inflict because developers often mirror entire response objects back into their message store.

const assistantMessage = response.choices[0].message;

messages.push({
  role: "assistant",
  content: assistantMessage.content,
});

// Do not replay assistantMessage.reasoning_content in the next request.

4. Streaming separates reasoning tokens from answer tokens

DeepSeek's streaming example shows `delta.reasoning_content` arriving separately from ordinary `delta.content`. That is useful for observability and UI design because a client can collect internal reasoning tokens and visible answer tokens independently.

The important boundary is still the same as the non-streaming path: collect `reasoning_content` if you need it for logging or debugging, but do not concatenate it back into the next request payload.

If your current bug involves tool replay rather than plain reasoning replay, continue with `/guides/deepseek-thinking-mode-tool-calls` next.

5. Treat this as a protocol contract, not a prompt-style preference

The official reasoning-model guide is really a protocol document. It tells you which fields are inert, which ones are rejected, how much output the route can produce, and which response field must stay out of subsequent input messages.

That is why this page solves a different support problem from the broader thinking-mode or chat-history guides. The failure class here is not 'the model forgot context'; it is 'the caller violated the reasoning-route contract.'

FAQ

Why does DeepSeek return a 400 when I replay `reasoning_content`?

DeepSeek's official reasoning-model guide says `reasoning_content` must not appear in input messages. Strip it before the next request.

Does `temperature` work on DeepSeek's reasoning model?

No. The official page says `temperature` is accepted only for compatibility and has no effect on this route.

What is the DeepSeek reasoning-model `max_tokens` limit?

DeepSeek documents a default of 32K and a maximum of 64K, and that budget includes the CoT segment plus the final answer.

Do `logprobs` and `top_logprobs` work on the reasoning model?

No. DeepSeek says those settings will trigger an error on the reasoning-model route.

Does this page create a new purchasable plan on /pricing?

No. It is API behavior guidance only and does not imply any new stocked Coding Plan inventory.

The practical DeepSeek reasoning-model rule is simple: budget `max_tokens` for both CoT and answer, delete inert sampling knobs from the request, and never replay `reasoning_content` back into input messages.

Related model comparisons

Continue from this guide into structured DeepSeek-first comparison pages with model tables, routing advice, and pricing context.

Best AI model for agentic workflows: DeepSeek V4-first routing Best cheap AI API for developers: DeepSeek V4-first shortlist DeepSeek V4 API pricing comparison: Pro, Flash, GPT 5.4, Claude, Gemini, Qwen, and more

See current DeepSeek key options