Updated 2026-06-11
DeepSeek Thinking Mode and Tool Calls: use reasoning correctly in multi-turn apps
DeepSeek's official Thinking Mode guide is more opinionated than many quick snippets suggest. Thinking is enabled by default, some common sampling parameters stop mattering, and `reasoning_content` becomes a real protocol requirement when tool calls enter the conversation. If you ignore that rule, the resulting 400 error is usually an implementation bug, not a model problem.
1. Thinking mode is on by default
DeepSeek's official Thinking Mode guide says the toggle defaults to `enabled`. That matters because teams often assume reasoning is an optional premium mode they have to turn on manually.
The same guide says the default effort is `high` for regular requests and can automatically become `max` for some complex agent requests such as Claude Code or OpenCode. This is a strong signal that DeepSeek expects real agent workflows to lean on reasoning rather than bypass it.
Sources checked
- DeepSeek official Thinking Mode guide - Primary source for default-enabled thinking and effort behavior.
2. Some classic sampling knobs stop mattering
DeepSeek's docs say thinking mode does not support `temperature`, `top_p`, `presence_penalty`, or `frequency_penalty`. Setting them will not raise an error, but they also will not change model behavior in thinking mode.
That is a subtle but important operations detail. Many teams think they are tuning thinking quality by changing these fields, when the effective control is really the effort level and whether thinking is enabled at all.
3. reasoning_content is not the same as final content
In thinking mode, DeepSeek returns chain-of-thought material in `reasoning_content` alongside the final answer in `content`. The docs then split multi-turn handling into two cases: no tool call versus tool call.
If there was no tool call between two user turns, the docs say prior `reasoning_content` does not need to be passed back and will be ignored if you do send it. That makes ordinary multi-turn chat simpler than some developers expect.
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=messages,
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}},
)
reasoning_content = response.choices[0].message.reasoning_content
content = response.choices[0].message.content4. Tool calls change the replay rule completely
The official guide is explicit here: if a turn performs tool calls, the intermediate assistant `reasoning_content` must be fully passed back to the API in all subsequent requests.
If you do not replay that reasoning context correctly, DeepSeek says the API will return a 400 error. This is one of the highest-signal debugging rules in the whole docs set because many agent loops silently drop intermediate reasoning while serializing tool state.
In practice, your DeepSeek tool-call adapter needs to store three things together: the assistant reasoning block, the tool invocation, and the tool result. Treat them as one transaction, not as loose fragments.
Sources checked
- DeepSeek official Thinking Mode guide - Explains the required replay rule and the 400 error consequence.
- DeepSeek official Tool Calls guide - Related guide for multi-turn tool-call handling.
5. OpenAI-format versus Anthropic-format effort controls
DeepSeek documents different control shapes for each protocol. In OpenAI format, the docs use `extra_body.thinking` plus the top-level `reasoning_effort` field. In Anthropic format, the guide maps effort through `output_config.effort`.
That means teams should avoid copying one protocol's request body into another and assuming the same keys will carry over cleanly.
| Protocol | Thinking toggle | Effort control |
|---|---|---|
| OpenAI format | extra_body.thinking | reasoning_effort |
| Anthropic format | Protocol-level thinking support | output_config.effort |
6. Recommended DeepSeek-first implementation pattern
Keep Flash for routine tool steps and wide fan-out, but use Pro for harder reasoning chains where tool planning quality matters more than raw concurrency. Store `reasoning_content` whenever a tool call appears, and test replay with the smallest possible loop before you trust a bigger agent runtime.
If your next problem is endpoint shape, go to `/guides/deepseek-openai-vs-anthropic-api-routing`. If your next problem is concurrency and tenant separation, go to `/guides/deepseek-rate-limit-and-user-id-isolation`.
FAQ
Is DeepSeek thinking mode enabled by default?
Yes. The official Thinking Mode guide says the toggle defaults to enabled.
Does temperature work in thinking mode?
No in practice. DeepSeek says `temperature`, `top_p`, `presence_penalty`, and `frequency_penalty` are unsupported in thinking mode and have no effect.
When can I ignore reasoning_content?
If there was no tool call between user turns, the docs say prior `reasoning_content` does not need to be passed back and will be ignored if you send it.
When must I replay reasoning_content?
When the turn performed tool calls. DeepSeek says that reasoning content must be passed back in all subsequent requests.
Why am I getting a 400 after a tool call?
One common cause is that your app dropped or mangled the required `reasoning_content` when replaying the conversation after the tool step.
DeepSeek thinking mode is not just a switch for better answers. It is part of the protocol contract for multi-turn agent work. Once tool calls appear, `reasoning_content` becomes state you must preserve, and teams that ignore that rule will eventually debug a self-inflicted 400.
Related model comparisons
Continue from this guide into structured DeepSeek-first comparison pages with model tables, routing advice, and pricing context.