How the official OpenAI Python SDK is built

The official Python library for the OpenAI API. Synchronous and async clients, full type coverage, streaming, automatic retries — the surface area you'd expect from a first-party SDK.

Why I starred it

I was curious how OpenAI handles the infrastructure concerns that every API client has to deal with: retries with backoff, streaming SSE, distinguishing None from "not passed". These problems are boring until you get them wrong. I wanted to see what a well-resourced team does when they sit down to build this stuff properly.

The short answer: the client itself is code-generated by Stainless from an OpenAPI spec, but the plumbing underneath — the base client, the streaming layer, the retry logic — is handwritten and worth reading.

How it works

The package splits cleanly into two layers. The resource classes in src/openai/resources/ are generated: they're thin wrappers that construct typed TypedDict params and delegate to the base client. All the interesting logic lives in _base_client.py, _streaming.py, _types.py, and a handful of utilities.

The NotGiven sentinel (_types.py:129) is the first thing worth stopping at. It solves a real problem: for some parameters, None is a valid value with semantic meaning, so you can't use None as the "not provided" sentinel. The library defines its own type for this:

class NotGiven:
    """
    For parameters with a meaningful None value, we need to distinguish between
    the user explicitly passing None, and the user not passing the parameter at all.
    """
    ...

not_given = NotGiven()
NOT_GIVEN = NotGiven()

This lets the SDK strip unset fields before serialization without accidentally stripping intentional None values. It's a small thing done right.

The PropertyInfo annotation pattern in _utils/_transform.py is how the SDK handles Python's snake_case-to-camelCase mapping without magic. You annotate a TypedDict field with PropertyInfo(alias='accountHolderName') and the transform layer handles the rename before sending. Same mechanism handles ISO 8601 date formatting and base64 encoding — metadata lives in the type annotation, not in ad hoc serialization code scattered across resource methods.

Retry logic in _base_client.py is well-thought-out. The _should_retry method handles the full set: 408 (request timeout), 409 (lock timeout), 429 (rate limit), and 5xx. It also obeys a non-standard x-should-retry response header, so the API can override retry behavior per-endpoint. Backoff is exponential with jitter:

def _calculate_retry_timeout(self, remaining_retries, options, response_headers=None):
    retry_after = self._parse_retry_after_header(response_headers)
    if retry_after is not None and 0 < retry_after <= 60:
        return retry_after

    nb_retries = min(max_retries - remaining_retries, 1000)
    sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
    jitter = 1 - 0.25 * random()
    return sleep_seconds * jitter

The _parse_retry_after_header method handles three formats: retry-after-ms (non-standard milliseconds), retry-after as float seconds, and retry-after as an RFC 2822 date. Most clients only handle the integer seconds case.

Streaming (_streaming.py) wraps the SSE decoding from httpx-sse and adds OpenAI-specific handling on top. The Stream.__stream__ generator terminates on [DONE], surfaces inline errors from the stream body before they get swallowed, and handles the Assistants API's thread.* event namespace separately from the standard chat streaming format. Both sync (Stream) and async (AsyncStream) implementations exist as separate generic classes.

The retry count header is a detail I appreciated: every request that's been retried gets an x-stainless-retry-count header injected (line 454), unless the caller already set one. That makes retries observable in server-side logs without any client-side instrumentation.

Using it

Basic chat completion:

from openai import OpenAI

client = OpenAI()  # picks up OPENAI_API_KEY from env

completion = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "explain backpressure in two sentences"}],
)
print(completion.choices[0].message.content)

Streaming with the async client:

from openai import AsyncOpenAI
import asyncio

async def stream_response():
    client = AsyncOpenAI()
    async with client.chat.completions.stream(
        model="gpt-4o",
        messages=[{"role": "user", "content": "count to 5"}],
    ) as stream:
        async for chunk in stream:
            print(chunk.choices[0].delta.content or "", end="", flush=True)

asyncio.run(stream_response())

Configuring retries and timeout per-request:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "hello"}],
    timeout=30.0,
    max_retries=5,
)

Rough edges

The library is generated from an OpenAPI spec, which means the resource classes themselves are not great to read or contribute to — they're not meant to be edited by hand. If you want to understand what parameters a method accepts, api.md or the type stubs are easier than reading resources/.

Pydantic v1 and v2 are both supported via a compatibility shim in _compat.py. The shim is functional but adds a layer of indirection that makes some model behavior harder to trace. This will presumably go away when v1 support is eventually dropped.

The helpers/ directory contains the higher-level streaming wrappers (like stream() context managers) which are more ergonomic than using Stream directly, but they're not prominently documented. You'll find them by reading the source before you find them in the docs.

Bottom line

If you're calling the OpenAI API from Python, this is what you use — it's not a choice. But reading the source is worthwhile: the NotGiven sentinel, the PropertyInfo transform pattern, and the retry infrastructure are all patterns you can lift for your own API clients.