Calling the Gemini API with the OpenAI SDK: How to Configure base_url, Model Names, and Gemini 3.5 Flash Pricing
June 14, 2026 · 10 min read · Claude / GPT / Gemini

Google’s OpenAI compatibility documentation was updated on 2026-05-18, and it states directly that Gemini models can be called through the OpenAI Python / JavaScript SDK. The core changes are only three fields: api_key, base_url, and model (Google). Today is 2026-06-14, and the pricing page lists the standard paid-tier price for gemini-3.5-flash as: input $1.50 / 1 million tokens, output including thinking tokens $9.00 / 1 million tokens (Google Pricing).
My judgment is simple: if you already have an OpenAI SDK project, don’t rewrite it yet. First connect Gemini as an OpenAI-compatible backend, verify cost, streaming, and tool calling, and then decide whether to migrate to the native Gemini SDK.

1. Minimal change: only switch the endpoint and model name
First install the official OpenAI SDK for Python:
pip install openai
export GEMINI_API_KEY="你的 Gemini API Key"
Then change the client to this:
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["GEMINI_API_KEY"],
base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)
resp = client.chat.completions.create(
model="gemini-3.5-flash",
reasoning_effort="low",
messages=[
{"role": "system", "content": "你是一个直接、准确的代码助手。"},
{"role": "user", "content": "用三句话解释 SSE 流式输出。"},
],
)
print(resp.choices[0].message.content)
Shapes like chat.completions.create, messages, and tools are still in the OpenAI Chat Completions style; OpenAI’s own API reference also still defines Chat Completions as an interface that generates replies based on a list of messages (OpenAI). So the focus of migration is not business code, but the configuration layer.
2. Don’t omit the final slash in base_url
The address in Google’s documentation is:
https://generativelanguage.googleapis.com/v1beta/openai/
If you omit the final /, some clients may run into strange issues when joining paths. In production code, it is recommended to extract it into environment variables:
OPENAI_BASE_URL="https://generativelanguage.googleapis.com/v1beta/openai/"
OPENAI_API_KEY="$GEMINI_API_KEY"
OPENAI_MODEL="gemini-3.5-flash"
If you want to avoid switching among multiple vendor accounts, quotas, and bills, onehop is the easier path: change base_url to https://api.onehop.ai/v1, and use the same OpenAI / Anthropic-compatible interface to call Claude, GPT, and Gemini. New accounts get $10 with no card required; it is suitable for doing a PoC first, then deciding whether to connect directly to the official provider.
from openai import OpenAI
client = OpenAI(
api_key="你的 onehop key",
base_url="https://api.onehop.ai/v1",
)
resp = client.chat.completions.create(
model="gemini-3.5-flash",
messages=[{"role": "user", "content": "给我一个 FastAPI 健康检查接口"}],
)
print(resp.choices[0].message.content)
The entry points are here: Call models such as Claude on onehop, register to get a free $10 trial credit.
3. Estimate costs based on output tokens first
The standard tier of gemini-3.5-flash is not “so cheap you don’t need to care.” The output price is 6 times the input price:
| Model | Tier | Input / 1 million tokens | Output / 1 million tokens |
|---|---|---|---|
gemini-3.5-flash |
Standard | $1.50 |
$9.00 |
gemini-3.5-flash |
Batch | $0.75 |
$4.50 |
gemini-3.5-flash |
Flex | $0.75 |
$4.50 |
The Batch and Flex figures also come from the same Google pricing page. When writing applications, you should limit max_completion_tokens, especially for summarization, code generation, and Agent tool loops. Longer input can still be cached; runaway output is real money burned.

4. How reasoning_effort maps
Google’s compatibility layer accepts the OpenAI-style reasoning_effort and maps it to Gemini’s thinking configuration (Google):
reasoning_effort |
Gemini 3 Flash thinking_level |
|---|---|
minimal |
minimal |
low |
low |
medium |
medium |
high |
high |
If you do not pass it, the model default is used. Google’s documentation also states one key limitation: Gemini 3 cannot disable thinking; none only applies to some 2.5 models. My recommendation is to default to low in production, and only raise it to medium or high for complex planning or long-chain tool calls. Because output pricing includes thinking tokens, reasoning intensity is not a free knob.
5. Streaming and function calling: usable, but guard against empty chunks
Streaming calls keep the OpenAI SDK style:
stream = client.chat.completions.create(
model="gemini-3.5-flash",
messages=[{"role": "user", "content": "写一个 Redis 缓存封装"}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end="", flush=True)
The if delta here is very practical. Streaming responses may contain roles, tool calls, or empty deltas; do not assume every chunk has text.
Function calling also uses tools and tool_choice="auto". Google’s compatibility documentation provides a weather function example and confirms that the Gemini API supports function calling (Google). In real projects, don’t just print what the model returns; check message.tool_calls, execute the local function, and then feed the tool result back to the model as the next-round message.
Conclusion: the minimum cost of migrating to Gemini is three lines of configuration; what you really need to watch is output tokens, thinking intensity, empty streaming chunks, and the closed loop for tool calling. If you simply want to quickly put Claude, GPT, and Gemini into the same OpenAI SDK project, using onehop’s unified entry point will save quite a bit of configuration time: call models such as Claude on onehop, or first claim the free $10 trial credit upon registration.
Related reading

Use Groq GPT-OSS 120B with the OpenAI SDK: Base URL, Pricing, and Caching
Swap one OpenAI SDK base URL to run GPT-OSS 120B on Groq, estimate cached token costs, and avoid tool billing surprises.
June 17, 2026 · 24 min read

Calling the Gemini API with the OpenAI SDK: A Migration Guide Changing Only base_url, API Key, and Model Name
A Gemini-compatible API migration checklist for existing OpenAI SDK projects, with code, parameter mapping, and pricing.
June 14, 2026 · 9 min read

Calling the Gemini API with the OpenAI SDK: An Integration Guide Requiring Only base_url, Key, and Model Name Changes
Connect existing OpenAI SDK code to Gemini with minimal changes to just three configuration fields.
June 14, 2026 · 9 min read