Back to all articles
Guides

Calling the Gemini API with the OpenAI SDK: An Integration Guide Requiring Only base_url, Key, and Model Name Changes

June 14, 2026 · 9 min read · GPT / Gemini / Claude

Developer integration diagram on an off-white background: a charcoal code window on the left with three configuration lines—base_url, api_key, and model—highlighted in terracotta; a Gemini API cloud node on the right; thin arrow lines connecting them; small pricing cards and SDK icon blocks at the bottom; no logos, no

Google has already said it plainly: the Gemini API supports OpenAI compatibility, so you can call it with the OpenAI Python SDK, JavaScript SDK, and REST; in the official examples, the base_url is https://generativelanguage.googleapis.com/v1beta/openai/, and the model name is gemini-3.5-flash (Google AI for Developers).
This is very practical for teams that already have OpenAI SDK code. You do not need to rewrite a client, and you do not need to change the message structure. First replace three configuration items, get it running, and then decide whether to use Gemini’s native capabilities.

A three-step migration flowchart on an off-white background, with an old OpenAI SDK configuration card on the left, a terracotta arrow in the middle labeled “change 3 lines,” and a new Gemini API configuration card on the right; the three lines are base_url, api_key, and model, using charcoal-gray thin lines and terracotta highlights

First, see which three items need to change

Google’s documentation states it very clearly after the example: the changes are exactly three items: api_key, base_url, and model. The OpenAI Python SDK itself also supports configuring base_url or OPENAI_BASE_URL (openai-python), and the corresponding field in the Node SDK is baseURL (openai-node).

Minimal Python example:

from openai import OpenAI

client = OpenAI(
    api_key="GEMINI_API_KEY",
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

resp = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[
        {"role": "system", "content": "你是一个简洁的技术助手。"},
        {"role": "user", "content": "用一句话解释什么是向量数据库。"},
    ],
)

print(resp.choices[0].message.content)

If your old code is already using client.chat.completions.create(), in most cases you only need to extract the client initialization into configuration. Do not hard-code the model name inside business functions; later, when you switch to Batch, Flex, or another model, it will hurt more.

Node.js follows the same pattern

In the JavaScript version, you only change the field name from base_url to baseURL:

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.GEMINI_API_KEY,
  baseURL: "https://generativelanguage.googleapis.com/v1beta/openai/",
});

const resp = await client.chat.completions.create({
  model: "gemini-3.5-flash",
  messages: [{ role: "user", content: "给我一个 Redis 限流方案。" }],
});

console.log(resp.choices[0].message.content);

REST calls work directly as well: POST https://generativelanguage.googleapis.com/v1beta/openai/chat/completions, with the request header Authorization: Bearer $GEMINI_API_KEY. This is useful for first using curl to rule out SDK, proxy, and environment variable issues.

A compact architecture diagram, with three entry points on the left—Python SDK, Node SDK, and REST—shown as small cards, flowing into the same OpenAI-compatible endpoint and then connecting to Gemini 3.5 Flash; off-white background, terracotta arrows, charcoal-gray labels, no branding

Do not estimate pricing by gut feeling; calculate based on output

As of 2026-06-14, the standard price for gemini-3.5-flash listed on Gemini’s official pricing page is $1.50 / 1M tokens for input and $9.00 / 1M tokens for output, while Batch and Flex are both $0.75 / 1M tokens for input and $4.50 / 1M tokens for output (Gemini API pricing). The official page also states that output pricing includes thinking tokens, so for reasoning tasks, do not focus only on the input price.

Mode Input / 1M tokens Output / 1M tokens Best suited for
Standard $1.50 $9.00 Online requests, low latency
Batch $0.75 $4.50 Offline batch processing
Flex $0.75 $4.50 Acceptable elastic scheduling

My suggestion is simple: use Standard first for chat, agents, and editor plugins; use Batch/Flex as much as possible for log summarization, data cleaning, and offline evaluation. For tasks with long outputs, the cost gap will be magnified.

The most common pitfalls during migration

First, do not omit the trailing /openai/ in base_url. Many 404s and “model not found” errors are essentially caused by hitting Gemini’s native path instead of the compatibility path.
Second, keep environment variables clearly separated. OPENAI_API_KEY can continue to be used for OpenAI, while for Gemini it is better to store GEMINI_API_KEY separately. Do not let two providers compete for the same variable in CI.
Third, run models.list() first. Google’s docs provide examples for listing models and retrieving a model; before production, use them to confirm that the account region, permissions, and model ID are all correct.
Fourth, capabilities such as streaming, function calling, and image input have examples in the compatibility layer, but do not assume that every OpenAI parameter is supported one-to-one. First run a regression test with one real business request.

If you want to maintain fewer keys

If your team uses Claude, GPT, and Gemini at the same time, applying for access one by one, binding cards, and configuring quotas can be annoying. The easier path is to use onehop: change the OpenAI SDK’s base_url to https://api.onehop.ai/v1, use a onehop key, and you can call Claude, GPT, and Gemini through OpenAI/Anthropic-compatible interfaces. Its positioning is to help you spend less effort on integration; pricing is lower than official pricing, new accounts get $10, and no card is required.

from openai import OpenAI

client = OpenAI(
    api_key="ONEHOP_API_KEY",
    base_url="https://api.onehop.ai/v1",
)

resp = client.chat.completions.create(
    model="gemini-3.5-flash",
    messages=[{"role": "user", "content": "把这段报错日志总结成三条原因。"}],
)

print(resp.choices[0].message.content)

If you want to try Claude or multi-model routing, you can go directly here: Call Claude and other models on onehop. If you want to run a small sample first without binding a card, use this entry point: Sign up and get $10 in trial credits.

Conclusion: make it configurable first; do not hard-code the provider

The most valuable part of Gemini’s OpenAI compatibility this time is not “yet another model,” but that the migration cost is low enough. Put base_url, api_key, and model into configuration, and keep using the OpenAI SDK’s chat completions in your business code.
After it runs successfully, add three more things: record input/output tokens, choose Standard or Batch/Flex by task, and make the model ID a gray-release switch. That way, whether you connect to Gemini today or Claude or another compatible service tomorrow, you will not need to touch core business code again.