Updated 2026-04-15

How to Use DeepSeek in Python — From Zero to Production

DeepSeek V4 exposes an OpenAI-compatible API, which means you can call it from Python without learning a new SDK: the standard openai package works as-is, you just change the base URL and the API key. This tutorial walks you through installing the SDK, making your first chat completion, streaming tokens, using function calling for agent workflows, and managing cost at scale.

1. Install the SDK and set your API key

You do not need a DeepSeek-specific package. The official openai Python SDK speaks the DeepSeek endpoint natively. Install it with pip, then load the API key from an environment variable — never hardcode it.

Get a production-grade API key from the DeepSeek console, or use our discounted official keys from /pricing (same interface, lower price). Put it in your shell as DEEPSEEK_API_KEY.

pip install "openai>=1.40.0" python-dotenv
echo "DEEPSEEK_API_KEY=sk-..." >> .env

2. Your first DeepSeek chat completion

Create an OpenAI client and point base_url at the DeepSeek endpoint. The chat.completions.create call is identical to OpenAI; only the model name changes (deepseek-chat for the generalist, deepseek-reasoner for the reasoning-heavy variant).

import os
from dotenv import load_dotenv
from openai import OpenAI

load_dotenv()

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/v1",
)

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "Write a Python one-liner to flatten a nested list."},
    ],
    temperature=0.2,
)

print(response.choices[0].message.content)
print("tokens used:", response.usage.total_tokens)

3. Streaming responses for low-latency UIs

For chat UIs, CLIs, or agent loops you want tokens to appear as they are generated. Pass stream=True and iterate over the chunks. Each chunk is an OpenAI-compatible delta object.

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Explain MoE architectures in 3 sentences."}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

4. Function calling (tool use) with DeepSeek V4

DeepSeek V4 supports OpenAI-style tool use. You declare a JSON schema for each tool, the model decides when to call it, and you execute the tool and return the result for the next turn. V4 is significantly more reliable at this than V3, so agent loops break less often.

Always validate the tool arguments before executing the call — treat them as untrusted input, exactly as you would for user input.

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {"city": {"type": "string"}},
            "required": ["city"],
        },
    },
}]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

response = client.chat.completions.create(
    model="deepseek-chat",
    messages=messages,
    tools=tools,
)

call = response.choices[0].message.tool_calls[0]
# → call.function.name == "get_weather"
# → call.function.arguments == '{"city":"Tokyo"}'

5. Controlling cost: tokens, context window, caching

DeepSeek bills input and output tokens separately, with output roughly 2× the price of input. Every system prompt and every previous message in the conversation counts as input tokens, so long chats get expensive fast if you never truncate.

Three practical tactics: (1) keep the system prompt short and stable (it can be cached), (2) summarise older turns and drop raw history once the chat exceeds ~6k tokens, (3) cap max_tokens on the output to avoid the model running away when it should be terse.

For teams burning millions of tokens a day, the /pricing discounted official keys typically cut another 20–40% off the list price without any code change.

6. Production checklist

Timeouts and retries: wrap every call with a timeout (15–30s for chat, 60–120s for long reasoning) and retry on 429 / 5xx with exponential backoff.

Structured output: use response_format={"type":"json_object"} when you need machine-parseable answers, and validate with Pydantic or jsonschema.

Observability: log model, prompt_tokens, completion_tokens, latency, and a truncated prompt hash. You will thank yourself the first time the bill spikes.

FAQ

Do I need a special DeepSeek Python SDK?

No. The standard openai package works with DeepSeek — just change base_url and api_key. Any OpenAI-compatible library (LangChain, LlamaIndex, Instructor) also works.

Which model name should I use?

deepseek-chat is the everyday generalist; deepseek-reasoner is tuned for math, logic and long chain-of-thought. Start with deepseek-chat and only switch when you need deeper reasoning.

Can I use async in Python?

Yes. Use openai.AsyncOpenAI instead of OpenAI and await the call. Everything else is identical.

How do I get a cheaper API key?

Discounted official DeepSeek API keys are listed on /pricing — same API, same context window, lower price.

Does streaming cost more?

No. Streaming only changes transport; pricing is still per-token and identical to non-streaming calls.

Python + DeepSeek V4 is the fastest path to a production LLM stack in 2026: one SDK you already know, a model that holds its own against GPT-4o and Claude Sonnet, and a price tag small teams can actually afford.