Code Execution Tool

A tool that lets an LLM or agent run code (typically in a general-purpose language such as Python) inside a sandbox and receive the output - stdout, stderr, generated files - as context for further reasoning.

Details

Code execution tools close the loop between generating code and observing its effects: the model writes a snippet, the runtime executes it, and the result feeds back into the conversation. This makes them useful for tasks that benefit from precise computation, data manipulation, or dynamic exploration - areas where pure text generation is unreliable. Execution usually happens in an isolated, ephemeral environment with restricted filesystem and network access to limit the blast radius of errors or malicious input.

Unlike API-style function tools where schemas constrain what the agent can request, code execution tools have no schema boundary between the model's output and the executed action - generated code can do anything the sandbox permits. This makes sandbox isolation the critical security control. Code execution tools are a primary surface for unauthorized code execution threats, since the model's output is evaluated as live code. Common mitigations include sandboxing, resource limits, and tool execution approval.

Examples

OpenAI Code Interpreter (provider-executed): runs Python in a server-side sandbox during inference and returns results inline.
Anthropic's bash/shell and code execution tools (provider-defined): the model emits structured calls; the developer's agent runtime executes them locally or in a remote sandbox.
Custom function tools that wrap a language runtime (e.g., a run_python(code) function tool backed by a Docker container).

Synonyms

code interpreter, code execution sandbox tool