Migrating the DeepSeek API to deepseek-v4-flash / deepseek-v4-pro: Choosing Between OpenAI- and Anthropic-Compatible Formats

2026-06-14 When looking at the DeepSeek API, the first thing you should change is not the prompt, but the model name. DeepSeek’s Chinese pricing page states it plainly: deepseek-chat and deepseek-reasoner will be deprecated at 23:59 Beijing time on 2026/07/24; during the compatibility period, the former maps to the non-thinking mode of deepseek-v4-flash, and the latter maps to the thinking mode of deepseek-v4-flash (DeepSeek pricing page). If your production code is still using the old names, don’t wait until the final week.

Timeline diagram on an off-white background, with the current checkpoint 2026-06-14 marked on the left and the deprecation point 2026-07-24 23:59 Beijing time marked on the right, connected by a terracotta arrow; old model names fade out in gray, and new model names are highlighted in charcoal gray

Choose the API format first: follow your ecosystem, not your beliefs

DeepSeek now provides two compatible entry points: the OpenAI format at https://api.deepseek.com, and the Anthropic format at https://api.deepseek.com/anthropic (DeepSeek first API call).

My recommendation is simple:

Your current situation	Which to choose	Why
Already using OpenAI SDK, LangChain, LlamaIndex, or Vercel AI SDK Chat Completions	OpenAI format	Requires the fewest changes to `base_url` and `model`
Already using the Anthropic SDK, Claude Code, or the Messages API structure	Anthropic format	Your habits around `system`, `messages.create`, and `max_tokens` stay the same
You wrote your own HTTP wrapper	Prefer the OpenAI format	More debugging resources, and the fields are more universal
You want to reuse the Claude toolchain	Anthropic format	DeepSeek explicitly supports the Anthropic API ecosystem (DeepSeek Anthropic API)

One gotcha: under the Anthropic format, DeepSeek performs model-name mapping. The official documentation states that names starting with claude-opus will be mapped to deepseek-v4-pro, while names starting with claude-haiku or claude-sonnet will be mapped to deepseek-v4-flash. I still recommend explicitly writing deepseek-v4-pro or deepseek-v4-flash instead of leaving production behavior to implicit mapping.

Model-name replacement: stop relying on compatibility aliases

The migration table has only two rows:

Old model name	Currently compatible with	Recommended usage
`deepseek-chat`	`deepseek-v4-flash` non-thinking mode	`deepseek-v4-flash` + disable thinking
`deepseek-reasoner`	`deepseek-v4-flash` thinking mode	`deepseek-v4-flash` or `deepseek-v4-pro` + enable thinking

If you previously used deepseek-reasoner for code review, complex SQL, or long-form reasoning, you can take the opportunity to evaluate deepseek-v4-pro. If it’s just customer service, summarization, or classification, deepseek-v4-flash is more like the default choice.

Model family migration diagram, with two old model cards on the left, deepseek-chat and deepseek-reasoner, and two new model cards on the right, deepseek-v4-flash and deepseek-v4-pro; different line styles indicate non-thinking and thinking modes, with terracotta highlighting the recommendation

OpenAI format: the minimal-change version

DeepSeek’s OpenAI format still uses Chat Completions. OpenAI’s official API itself also follows the message-list style of POST /v1/chat/completions (OpenAI API Reference), so most SDKs only need two changes.

# pip install openai
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[
        {"role": "system", "content": "你是一个严谨的代码审查助手。"},
        {"role": "user", "content": "检查这段 Python 代码的潜在 bug。"},
    ],
    extra_body={"thinking": {"type": "disabled"}},
    stream=False,
)

print(resp.choices[0].message.content)

To enable thinking mode, change the final section to:

reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}}

DeepSeek’s thinking mode is enabled by default. In the OpenAI SDK, thinking should be placed inside extra_body; thinking effort supports high and max (DeepSeek thinking mode). If your tool-calling chain sends assistant messages back, remember one hard rule: for thinking-mode turns involving tool calls, subsequent requests must send back the complete reasoning_content, or you will get a 400.

Anthropic format: leave a back door for the Claude toolchain

If you have already built around the Anthropic Messages API with system prompts, max_tokens, and client.messages.create(), just change the Base URL:

# pip install anthropic
import os
import anthropic

client = anthropic.Anthropic(
    api_key=os.environ["DEEPSEEK_API_KEY"],
    base_url="https://api.deepseek.com/anthropic",
)

msg = client.messages.create(
    model="deepseek-v4-pro",
    max_tokens=1000,
    system="你是一个资深后端工程师。",
    messages=[{"role": "user", "content": "给我一个 Redis 缓存穿透的修复方案。"}],
    thinking={"type": "enabled"},
    output_config={"effort": "high"},
)

print(msg.content)

Anthropic’s official Messages API also uses the messages.create structure, with core fields including model, max_tokens, system, and messages (Anthropic Messages API). So the real reason to choose the Anthropic format is not that it is “more advanced,” but that it requires fewer changes to Claude ecosystem code.

Cost estimation: start by assuming cache misses

DeepSeek’s Chinese pricing page charges per million tokens: for deepseek-v4-flash, input cache hits cost 0.02 yuan, input cache misses cost 1 yuan, and output costs 2 yuan; for deepseek-v4-pro, input cache hits cost 0.025 yuan, input cache misses cost 3 yuan, and output costs 6 yuan (DeepSeek pricing page).

A rough calculation: 200M input tokens and 50M output tokens per month, first assuming all input tokens are cache misses.

Model	Input cost	Output cost	Total
`deepseek-v4-flash`	200 yuan	100 yuan	300 yuan
`deepseek-v4-pro`	600 yuan	300 yuan	900 yuan

The actual bill will be affected by cache hits. When doing RAG, Agents, or long system prompts, put stable prefixes at the front. Once the hit rate rises, input costs will drop significantly. Don’t use the most optimistic cache numbers for budgeting on the first day of migration.

Relay API users: onehop is the easier path

If your goal is not only to connect to DeepSeek, but to switch between Claude, GPT, and Gemini within the same codebase, onehop is a time-saving path: it is OpenAI/Anthropic compatible, the Base URL changes to https://api.onehop.ai/v1, new accounts get $10, pricing is lower than official rates, and no card binding is required.

OpenAI SDK example:

from openai import OpenAI
import os

client = OpenAI(
    api_key=os.environ["ONEHOP_API_KEY"],
    base_url="https://api.onehop.ai/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "用三点解释这个 API 迁移计划。"}],
)
print(resp.choices[0].message.content)

The Anthropic SDK can also point to the same Base URL:

import anthropic, os

client = anthropic.Anthropic(
    api_key=os.environ["ONEHOP_API_KEY"],
    base_url="https://api.onehop.ai/v1",
)

For a production migration, I would schedule it like this: replace the old model names today; run smoke tests for both the OpenAI and Anthropic paths this week; complete the thinking-mode and tool-call response pass-through logic before the end of the month; and delete all instances of deepseek-chat and deepseek-reasoner before mid-July. If you want to maintain fewer vendor configurations, you can try it directly: call Claude and other models on onehop, or first claim credits to get the workflow running: register to get $10 in trial credits.

Migrating the DeepSeek API to deepseek-v4-flash / deepseek-v4-pro: Choosing Between OpenAI- and Anthropic-Compatible Formats

Choose the API format first: follow your ecosystem, not your beliefs

Model-name replacement: stop relying on compatibility aliases

OpenAI format: the minimal-change version

Anthropic format: leave a back door for the Claude toolchain

Cost estimation: start by assuming cache misses

Relay API users: onehop is the easier path

Related reading

Call Qwen3.7 Plus with the OpenAI SDK via DashScope Compatible Mode

Use Groq GPT-OSS 120B with the OpenAI SDK: Base URL, Pricing, and Caching

Calling the Gemini API with the OpenAI SDK: A Migration Guide Changing Only base_url, API Key, and Model Name