OpenAI Agents SDK Native Sandbox and Manifest Guide
June 16, 2026 · 25 min read · GPT

OpenAI shipped the important Agents SDK update on April 15, 2026, and the tell was the install line in the launch post: pip install "openai-agents>=0.14.0" (OpenAI). That version line matters. This was not a new prompt template or another function-calling wrapper. It was OpenAI moving file work, shell work, patching, sandbox lifecycle, and workspace description into SDK-level primitives.
For developers building coding agents, document agents, data-cleanup agents, or repo-maintenance bots, the design shift is simple: stop making every team rebuild the same brittle harness around Docker, temp directories, tool schemas, file staging, and retry logic. The SDK now gives you a model-native harness, sandbox-native execution, filesystem tools, MCP, AGENTS.md, shell access, apply_patch, and a Manifest abstraction for describing portable workspaces.

The change: from tool calls to a real workspace
The original Agents SDK was already useful for orchestration: agents, tools, handoffs, guardrails, tracing. The April update adds the missing runtime shape for agents that need to work with files over time.
OpenAI describes the updated SDK as helping developers build agents that can inspect files, run commands, edit code, and work on long-horizon tasks inside controlled sandbox environments (OpenAI). The phrase “controlled workspace” is the key. A serious file-working agent needs more than a list of tools. It needs a root directory, mounted inputs, output locations, a shell, permissions, snapshots, and a way to resume when the container dies.
Before this update, a typical production setup looked like this:
- create a temp workspace
- copy files into it
- expose read, write, shell, and patch tools
- validate paths manually
- start a sandbox or container
- collect artifacts
- snapshot state if the job runs long
- translate all of that into model-facing instructions
That glue code is where many agent projects quietly get messy. The new SDK turns much of it into first-class configuration.
OpenAI’s launch post listed built-in support for Blaxel, Cloudflare, Daytona, E2B, Modal, Runloop, and Vercel as sandbox providers, with a “bring your own sandbox” path as well (OpenAI). That is seven hosted providers at launch, plus local development paths.
Manifest is the portability layer
The Manifest abstraction is the most practical part of the release. It describes what the sandbox workspace should contain before the model starts working.
In the Python docs, Sandbox Agents are marked beta, require Python 3.10 or higher, and are presented as a way to give the model a persistent workspace where it can search document sets, edit files, run commands, generate artifacts, and resume from saved sandbox state (OpenAI Agents SDK Python docs).
A compact Python shape looks like this:
from agents import Runner
from agents.run import RunConfig
from agents.sandbox import Manifest, SandboxAgent, SandboxRunConfig
from agents.sandbox.entries import LocalDir
from agents.sandbox.sandboxes.unix_local import UnixLocalSandboxClient
agent = SandboxAgent(
name="Repo maintainer",
model="gpt-5.5",
instructions="Read repo/task.md, edit with apply_patch, then run the targeted test.",
default_manifest=Manifest(entries={"repo": LocalDir(src="./repo")}),
)
result = await Runner.run(
agent,
"Fix the failing test and summarize the change.",
run_config=RunConfig(
sandbox=SandboxRunConfig(client=UnixLocalSandboxClient())
),
)
The important part is not the syntax. It is the contract. The Manifest can describe local files, directories, Git repos, synthetic files, environment variables, users, groups, output directories, and remote storage mounts. The JavaScript docs say Manifest entry paths are workspace-relative, cannot be absolute, and cannot escape the workspace with .., which is exactly the kind of boring constraint you want enforced by the runtime rather than remembered in every prompt (OpenAI Agents SDK JS docs).

Capabilities: shell, filesystem, skills, memory, compaction
A SandboxAgent is not just a normal agent with a temp folder. It carries sandbox-specific capabilities.
The JS concepts docs list built-in capabilities including shell(), filesystem(), skills(), memory(), and compaction() (OpenAI Agents SDK JS docs). Defaults matter here: the docs state that Capabilities.default() includes filesystem, shell, and compaction. That means the common coding-agent loop is no longer a pile of bespoke tool definitions.
The filesystem capability exposes patch-style file edits. The shell capability exposes command execution inside the sandbox session. Skills let you progressively disclose specialized instructions or procedures. Memory and compaction help longer runs keep useful state without stuffing every prior token back into the next turn.
This matches how strong coding agents actually work. They inspect. They run a command. They edit a file. They run a smaller command. They inspect the diff. They summarize what changed. If your harness treats each step as an unrelated API call, the model spends too much attention reconstructing its world. A sandbox session gives the model a place to stand.
AGENTS.md also fits naturally into this model. The open AGENTS.md site describes it as a Markdown format for guiding coding agents and says it is used by more than 60,000 open-source projects (AGENTS.md). That file should contain build commands, test instructions, style rules, and repo-specific warnings. In the sandbox world, AGENTS.md becomes workspace-local operating context rather than a giant prompt pasted into every task.
Python-first, TypeScript catching up
At launch, this was Python-first. TechCrunch reported on April 15 that the new harness and sandbox capabilities were launching first in Python, with TypeScript support planned later (TechCrunch). PyPI backs up the date: openai-agents versions 0.14.0 and 0.14.1 were uploaded on April 15, 2026 (PyPI).
As of June 16, 2026, the practical picture is more balanced. The official JS docs now include beta Sandbox Agents, require Node.js 22 or higher, and show Manifest, SandboxAgent, UnixLocalSandboxClient, Docker support, and hosted provider clients through @openai/agents-extensions (OpenAI Agents SDK JS quickstart). The JS docs also note Deno and Bun can work when package resolution and runtime APIs are compatible.
| Area | Python | TypeScript / JavaScript |
|---|---|---|
| Launch status on Apr 15 | First supported path | Planned later |
| Current sandbox docs | Beta, Python 3.10+ | Beta, Node.js 22+ |
| Local sandbox | UnixLocalSandboxClient |
UnixLocalSandboxClient |
| Docker sandbox | openai-agents[docker] |
DockerSandboxClient |
| Hosted providers | Supported via SDK integrations | @openai/agents-extensions provider paths |
That does not mean the two ecosystems are identical. Python was the original launch surface, and many examples still land there first. TypeScript now has enough official surface area to prototype real sandbox agents, but hosted-provider details, PTY behavior, mounts, and lifecycle support still need careful reading per backend.
How I would structure a file-working agent now
The mistake is to treat the sandbox as a magic safety box. It is a runtime boundary, not a product spec. You still need to design the workspace.
A clean structure looks like this:
repo/: the working tree or mounted repositorytask.md: the task spec the model must read firstinputs/: read-only documents, datasets, screenshots, or logsoutput/: the only place final generated artifacts should goAGENTS.md: build, test, style, and safety instructions- sandbox user: a non-root identity where the backend supports it
- Manifest env: non-secret config persisted by default, secrets marked ephemeral
The Manifest should describe inputs. The agent instructions should describe workflow. The user prompt should describe the one-off task. Keep those separate. The JS docs explicitly warn against stuffing long reference material into instructions when it belongs in the Manifest (OpenAI Agents SDK JS docs).
For production, pick the backend based on the blast radius, not the demo path. Unix-local is fine for development. Docker is a better default when you need repeatability. Hosted providers make sense when you need clean isolation per run, remote execution, scaling, or provider-specific snapshot behavior. The JS clients docs state that hosted provider support varies and developers should check provider docs for environment variables, ports, PTY, snapshots, and cleanup behavior (OpenAI Agents SDK JS clients).
The ecosystem read
This update is important because it standardizes the shape of file-working agents. The industry already converged on a few primitives: MCP for external tools, AGENTS.md for repo instructions, shell for real inspection, patching for reviewable edits, and sandboxes for containment. OpenAI’s Agents SDK now packages those pieces into a runtime developers can actually compose.
The sharp edge remains permissions. A sandboxed agent with broad network, writable mounts, long-lived credentials, and vague instructions can still do damage. The Manifest helps because it makes workspace inputs and grants visible. It does not remove the need for approval policies, secret hygiene, dependency pinning, and artifact review.
The best use case today is not “agent does everything.” It is narrower and more valuable: give the model a bounded workspace, a clear task file, repo-local instructions, shell and patch tools, and one explicit verification path. Let it work like a junior engineer in a disposable environment. Then inspect the diff.
That is a much healthier abstraction than hand-rolling another half-sandbox around a chat loop.
Readers who want to try these models hands-on can call them via onehop with an OpenAI-compatible API by changing one base_url. It is cheaper than first-party, and new accounts get $10 free credit with no card required: call Claude and other models on onehop, or sign up for $10 free credit.
Related reading

Google Antigravity CLI vs Gemini CLI: What Developers Need to Migrate Before June 18, 2026
Gemini CLI consumer access ends June 18, 2026. Here’s what changes, who is affected, and how to migrate terminal workflows.
June 15, 2026 · 24 min read

Use Groq GPT-OSS 120B with the OpenAI SDK: Base URL, Pricing, and Caching
Swap one OpenAI SDK base URL to run GPT-OSS 120B on Groq, estimate cached token costs, and avoid tool billing surprises.
June 17, 2026 · 24 min read

GPT-5 vs Gemini 2.5 Pro vs Claude Opus 4 on Aider Polyglot Coding
A data-first comparison of GPT-5, Gemini 2.5 Pro, and Claude Opus 4 on Aider Polyglot coding.
June 17, 2026 · 20 min read