Updated 2026-04-25
How to Use DeepSeek in Python — From Zero to Production
DeepSeek V4 exposes an OpenAI-compatible API, which means you can call it from Python without learning a new SDK: the standard openai package works as-is, you just change the base URL and the API key. This tutorial walks you through installing the SDK, making your first chat completion, streaming tokens, using function calling for agent workflows, and managing cost at scale.
1. Install the SDK and set your API key
You do not need a DeepSeek-specific package. The official openai Python SDK speaks the DeepSeek endpoint natively. Install it with pip, then load the API key from an environment variable — never hardcode it.
Get a production-grade API key from the DeepSeek console, or use our discounted official keys from /pricing (same interface, lower price). Put it in your shell as DEEPSEEK_API_KEY.
pip install "openai>=1.40.0" python-dotenv
echo "DEEPSEEK_API_KEY=sk-..." >> .env2. Your first DeepSeek chat completion
Create an OpenAI client and point base_url at the DeepSeek endpoint. The chat.completions.create call is identical to OpenAI; only the model name changes. Start with `deepseek-v4-flash` for everyday traffic and switch to `deepseek-v4-pro` for harder reasoning or review-heavy tasks.
import os
from dotenv import load_dotenv
from openai import OpenAI
load_dotenv()
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/v1",
)
response = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "You are a concise technical assistant."},
{"role": "user", "content": "Write a Python one-liner to flatten a nested list."},
],
temperature=0.2,
)
print(response.choices[0].message.content)
print("tokens used:", response.usage.total_tokens)3. Streaming responses for low-latency UIs
For chat UIs, CLIs, or agent loops you want tokens to appear as they are generated. Pass stream=True and iterate over the chunks. Each chunk is an OpenAI-compatible delta object.
stream = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Explain MoE architectures in 3 sentences."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)4. Function calling (tool use) with DeepSeek V4
DeepSeek V4 supports OpenAI-style tool use. You declare a JSON schema for each tool, the model decides when to call it, and you execute the tool and return the result for the next turn. V4 is significantly more reliable at this than V3, so agent loops break less often.
Always validate the tool arguments before executing the call — treat them as untrusted input, exactly as you would for user input.
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
"required": ["city"],
},
},
}]
messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=messages,
tools=tools,
)
call = response.choices[0].message.tool_calls[0]
# → call.function.name == "get_weather"
# → call.function.arguments == '{"city":"Tokyo"}'5. Controlling cost: tokens, context window, caching
DeepSeek bills input and output tokens separately. Under the current promotional pricing (through May 31, 2026), the English official price table lists DeepSeek V4 Pro at $0.435 per 1M cache-miss input tokens and $0.87 per 1M output tokens. Cache-hit input drops to $0.003625 per 1M tokens (RMB 0.025 on the Chinese table) for repeated system prompts and retrieval blocks.
DeepSeek V4 Flash is the faster, cheaper route at $0.14 per 1M input tokens and $0.28 per 1M output tokens, with cache-hit input dropping to just $0.0028 per 1M. Its quality is strong enough for everyday coding, chat, retrieval, and repeated tool steps.
Three practical tactics: (1) keep the system prompt short and stable (it can be cached), (2) summarise older turns and drop raw history once the chat exceeds ~6k tokens, (3) cap max_tokens on the output to avoid the model running away when it should be terse.
For teams burning millions of tokens a day, the /pricing discounted official keys typically cut another 20–40% off the list price without any code change.
6. Production checklist
Timeouts and retries: wrap every call with a timeout (15–30s for chat, 60–120s for long reasoning) and retry on 429 / 5xx with exponential backoff.
Structured output: use response_format={"type":"json_object"} when you need machine-parseable answers, and validate with Pydantic or jsonschema.
Observability: log model, prompt_tokens, completion_tokens, latency, and a truncated prompt hash. You will thank yourself the first time the bill spikes.
FAQ
Do I need a special DeepSeek Python SDK?
No. The standard openai package works with DeepSeek — just change base_url and api_key. Any OpenAI-compatible library (LangChain, LlamaIndex, Instructor) also works.
Which model name should I use?
Use `deepseek-v4-flash` as the default route for chat, streaming, and repeated tool steps. Use `deepseek-v4-pro` when you need stronger reasoning, review quality, or harder coding chains.
Can I use async in Python?
Yes. Use openai.AsyncOpenAI instead of OpenAI and await the call. Everything else is identical.
How do I get a cheaper API key?
Discounted official DeepSeek API keys are listed on /pricing — same API, same context window, lower price.
Does streaming cost more?
No. Streaming only changes transport; pricing is still per-token and identical to non-streaming calls.
Python + DeepSeek V4 is the fastest path to a production LLM stack in 2026: one SDK you already know, named Pro and Flash routes you can actually deploy, and a price tag small teams can still afford against GPT 5.4 and Claude.
Related model comparisons
Continue from this guide into structured DeepSeek-first comparison pages with model tables, routing advice, and pricing context.