Updated 2026-06-11

DeepSeek Thinking Mode & Tool Calls in Multi-Turn Apps

DeepSeek's official Thinking Mode guide is more opinionated than many quick snippets suggest. Thinking is enabled by default, some common sampling parameters stop mattering, and `reasoning_content` becomes a real protocol requirement when tool calls enter the conversation. If you ignore that rule, the resulting 400 error is usually an implementation bug, not a model problem.

1. Thinking mode is on by default

DeepSeek's official Thinking Mode guide says the toggle defaults to `enabled`. That matters because teams often assume reasoning is an optional premium mode they have to turn on manually.

The same guide says the default effort is `high` for regular requests and can automatically become `max` for some complex agent requests such as Claude Code or OpenCode. This is a strong signal that DeepSeek expects real agent workflows to lean on reasoning rather than bypass it.

Sources checked

DeepSeek official Thinking Mode guide - Primary source for default-enabled thinking and effort behavior.

2. Some classic sampling knobs stop mattering

DeepSeek's docs say thinking mode does not support `temperature`, `top_p`, `presence_penalty`, or `frequency_penalty`. Setting them will not raise an error, but they also will not change model behavior in thinking mode.

That is a subtle but important operations detail. Many teams think they are tuning thinking quality by changing these fields, when the effective control is really the effort level and whether thinking is enabled at all.

3. reasoning_content is not the same as final content

In thinking mode, DeepSeek returns chain-of-thought material in `reasoning_content` alongside the final answer in `content`. The docs then split multi-turn handling into two cases: no tool call versus tool call.

If there was no tool call between two user turns, the docs say prior `reasoning_content` does not need to be passed back and will be ignored if you do send it. That makes ordinary multi-turn chat simpler than some developers expect.

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=messages,
    reasoning_effort="high",
    extra_body={"thinking": {"type": "enabled"}},
)

reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content

4. Tool calls change the replay rule completely

The official guide is explicit here: if a turn performs tool calls, the intermediate assistant `reasoning_content` must be fully passed back to the API in all subsequent requests.

If you do not replay that reasoning context correctly, DeepSeek says the API will return a 400 error. This is one of the highest-signal debugging rules in the whole docs set because many agent loops silently drop intermediate reasoning while serializing tool state.

In practice, your DeepSeek tool-call adapter needs to store three things together: the assistant reasoning block, the tool invocation, and the tool result. Treat them as one transaction, not as loose fragments.

Sources checked

DeepSeek official Thinking Mode guide - Explains the required replay rule and the 400 error consequence.
DeepSeek official Tool Calls guide - Related guide for multi-turn tool-call handling.

5. OpenAI-format versus Anthropic-format effort controls

DeepSeek documents different control shapes for each protocol. In OpenAI format, the docs use `extra_body.thinking` plus the top-level `reasoning_effort` field. In Anthropic format, the guide maps effort through `output_config.effort`.

That means teams should avoid copying one protocol's request body into another and assuming the same keys will carry over cleanly.

Thinking controls by protocol
Protocol	Thinking toggle	Effort control
OpenAI format	extra_body.thinking	reasoning_effort
Anthropic format	Protocol-level thinking support	output_config.effort

6. Recommended DeepSeek-first implementation pattern

Keep Flash for routine tool steps and wide fan-out, but use Pro for harder reasoning chains where tool planning quality matters more than raw concurrency. Store `reasoning_content` whenever a tool call appears, and test replay with the smallest possible loop before you trust a bigger agent runtime.

If your next problem is endpoint shape, go to `/guides/deepseek-openai-vs-anthropic-api-routing`. If your next problem is concurrency and tenant separation, go to `/guides/deepseek-rate-limit-and-user-id-isolation`.

FAQ

Is DeepSeek thinking mode enabled by default?

Yes. The official Thinking Mode guide says the toggle defaults to enabled.

Does temperature work in thinking mode?

No in practice. DeepSeek says `temperature`, `top_p`, `presence_penalty`, and `frequency_penalty` are unsupported in thinking mode and have no effect.

When can I ignore reasoning_content?

If there was no tool call between user turns, the docs say prior `reasoning_content` does not need to be passed back and will be ignored if you send it.

When must I replay reasoning_content?

When the turn performed tool calls. DeepSeek says that reasoning content must be passed back in all subsequent requests.

Why am I getting a 400 after a tool call?

One common cause is that your app dropped or mangled the required `reasoning_content` when replaying the conversation after the tool step.

DeepSeek thinking mode is not just a switch for better answers. It is part of the protocol contract for multi-turn agent work. Once tool calls appear, `reasoning_content` becomes state you must preserve, and teams that ignore that rule will eventually debug a self-inflicted 400.

Related model comparisons

Continue from this guide into structured DeepSeek-first comparison pages with model tables, routing advice, and pricing context.

Best AI model for agentic workflows: DeepSeek V4-first routing Best AI model for coding: DeepSeek V4-first comparison DeepSeek V4 vs Claude: 1M context, coding, review quality, and API cost

Get a discounted DeepSeek API key