Migrating the DeepSeek API to deepseek-v4-flash / deepseek-v4-pro: Choosing Between OpenAI- and Anthropic-Compatible Formats
June 14, 2026 · 12 min read · Claude / GPT / Gemini / DeepSeek

2026-06-14 When looking at the DeepSeek API, the first thing you should change is not the prompt, but the model name. DeepSeek’s Chinese pricing page states it plainly: deepseek-chat and deepseek-reasoner will be deprecated at 23:59 Beijing time on 2026/07/24; during the compatibility period, the former maps to the non-thinking mode of deepseek-v4-flash, and the latter maps to the thinking mode of deepseek-v4-flash (DeepSeek pricing page). If your production code is still using the old names, don’t wait until the final week.

Choose the API format first: follow your ecosystem, not your beliefs
DeepSeek now provides two compatible entry points: the OpenAI format at https://api.deepseek.com, and the Anthropic format at https://api.deepseek.com/anthropic (DeepSeek first API call).
My recommendation is simple:
| Your current situation | Which to choose | Why |
|---|---|---|
| Already using OpenAI SDK, LangChain, LlamaIndex, or Vercel AI SDK Chat Completions | OpenAI format | Requires the fewest changes to base_url and model |
| Already using the Anthropic SDK, Claude Code, or the Messages API structure | Anthropic format | Your habits around system, messages.create, and max_tokens stay the same |
| You wrote your own HTTP wrapper | Prefer the OpenAI format | More debugging resources, and the fields are more universal |
| You want to reuse the Claude toolchain | Anthropic format | DeepSeek explicitly supports the Anthropic API ecosystem (DeepSeek Anthropic API) |
One gotcha: under the Anthropic format, DeepSeek performs model-name mapping. The official documentation states that names starting with claude-opus will be mapped to deepseek-v4-pro, while names starting with claude-haiku or claude-sonnet will be mapped to deepseek-v4-flash. I still recommend explicitly writing deepseek-v4-pro or deepseek-v4-flash instead of leaving production behavior to implicit mapping.
Model-name replacement: stop relying on compatibility aliases
The migration table has only two rows:
| Old model name | Currently compatible with | Recommended usage |
|---|---|---|
deepseek-chat |
deepseek-v4-flash non-thinking mode |
deepseek-v4-flash + disable thinking |
deepseek-reasoner |
deepseek-v4-flash thinking mode |
deepseek-v4-flash or deepseek-v4-pro + enable thinking |
If you previously used deepseek-reasoner for code review, complex SQL, or long-form reasoning, you can take the opportunity to evaluate deepseek-v4-pro. If it’s just customer service, summarization, or classification, deepseek-v4-flash is more like the default choice.

OpenAI format: the minimal-change version
DeepSeek’s OpenAI format still uses Chat Completions. OpenAI’s official API itself also follows the message-list style of POST /v1/chat/completions (OpenAI API Reference), so most SDKs only need two changes.
# pip install openai
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[
{"role": "system", "content": "你是一个严谨的代码审查助手。"},
{"role": "user", "content": "检查这段 Python 代码的潜在 bug。"},
],
extra_body={"thinking": {"type": "disabled"}},
stream=False,
)
print(resp.choices[0].message.content)
To enable thinking mode, change the final section to:
reasoning_effort="high",
extra_body={"thinking": {"type": "enabled"}}
DeepSeek’s thinking mode is enabled by default. In the OpenAI SDK, thinking should be placed inside extra_body; thinking effort supports high and max (DeepSeek thinking mode). If your tool-calling chain sends assistant messages back, remember one hard rule: for thinking-mode turns involving tool calls, subsequent requests must send back the complete reasoning_content, or you will get a 400.
Anthropic format: leave a back door for the Claude toolchain
If you have already built around the Anthropic Messages API with system prompts, max_tokens, and client.messages.create(), just change the Base URL:
# pip install anthropic
import os
import anthropic
client = anthropic.Anthropic(
api_key=os.environ["DEEPSEEK_API_KEY"],
base_url="https://api.deepseek.com/anthropic",
)
msg = client.messages.create(
model="deepseek-v4-pro",
max_tokens=1000,
system="你是一个资深后端工程师。",
messages=[{"role": "user", "content": "给我一个 Redis 缓存穿透的修复方案。"}],
thinking={"type": "enabled"},
output_config={"effort": "high"},
)
print(msg.content)
Anthropic’s official Messages API also uses the messages.create structure, with core fields including model, max_tokens, system, and messages (Anthropic Messages API). So the real reason to choose the Anthropic format is not that it is “more advanced,” but that it requires fewer changes to Claude ecosystem code.
Cost estimation: start by assuming cache misses
DeepSeek’s Chinese pricing page charges per million tokens: for deepseek-v4-flash, input cache hits cost 0.02 yuan, input cache misses cost 1 yuan, and output costs 2 yuan; for deepseek-v4-pro, input cache hits cost 0.025 yuan, input cache misses cost 3 yuan, and output costs 6 yuan (DeepSeek pricing page).
A rough calculation: 200M input tokens and 50M output tokens per month, first assuming all input tokens are cache misses.
| Model | Input cost | Output cost | Total |
|---|---|---|---|
deepseek-v4-flash |
200 yuan | 100 yuan | 300 yuan |
deepseek-v4-pro |
600 yuan | 300 yuan | 900 yuan |
The actual bill will be affected by cache hits. When doing RAG, Agents, or long system prompts, put stable prefixes at the front. Once the hit rate rises, input costs will drop significantly. Don’t use the most optimistic cache numbers for budgeting on the first day of migration.
Relay API users: onehop is the easier path
If your goal is not only to connect to DeepSeek, but to switch between Claude, GPT, and Gemini within the same codebase, onehop is a time-saving path: it is OpenAI/Anthropic compatible, the Base URL changes to https://api.onehop.ai/v1, new accounts get $10, pricing is lower than official rates, and no card binding is required.
OpenAI SDK example:
from openai import OpenAI
import os
client = OpenAI(
api_key=os.environ["ONEHOP_API_KEY"],
base_url="https://api.onehop.ai/v1",
)
resp = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "用三点解释这个 API 迁移计划。"}],
)
print(resp.choices[0].message.content)
The Anthropic SDK can also point to the same Base URL:
import anthropic, os
client = anthropic.Anthropic(
api_key=os.environ["ONEHOP_API_KEY"],
base_url="https://api.onehop.ai/v1",
)
For a production migration, I would schedule it like this: replace the old model names today; run smoke tests for both the OpenAI and Anthropic paths this week; complete the thinking-mode and tool-call response pass-through logic before the end of the month; and delete all instances of deepseek-chat and deepseek-reasoner before mid-July. If you want to maintain fewer vendor configurations, you can try it directly: call Claude and other models on onehop, or first claim credits to get the workflow running: register to get $10 in trial credits.
Related reading

Use Groq GPT-OSS 120B with the OpenAI SDK: Base URL, Pricing, and Caching
Swap one OpenAI SDK base URL to run GPT-OSS 120B on Groq, estimate cached token costs, and avoid tool billing surprises.
June 17, 2026 · 24 min read

Calling the Gemini API with the OpenAI SDK: A Migration Guide Changing Only base_url, API Key, and Model Name
A Gemini-compatible API migration checklist for existing OpenAI SDK projects, with code, parameter mapping, and pricing.
June 14, 2026 · 9 min read

Calling the Gemini API with the OpenAI SDK: An Integration Guide Requiring Only base_url, Key, and Model Name Changes
Connect existing OpenAI SDK code to Gemini with minimal changes to just three configuration fields.
June 14, 2026 · 9 min read