OpenAI Cookbook: how OpenAI documents its own API

January 8, 2023

|repo-review

by Florian Narr

OpenAI Cookbook: how OpenAI documents its own API

The OpenAI Cookbook is a collection of Jupyter notebooks, MDX articles, and guides showing how to work with the OpenAI API — covering embeddings, fine-tuning, function calling, agents, evals, and more. It lives at cookbook.openai.com and currently has 72k stars.

Why I starred it

Most API documentation shows you what's possible. The Cookbook shows you how to actually do it — with runnable notebooks, real outputs, and patterns that go further than the reference docs. When I was starting to build with the OpenAI API in early 2023, this was the most useful thing OpenAI had published. Not because it was complete, but because it was honest: here's a notebook, run it, see what happens.

What caught my attention as a developer was less the content and more the infrastructure around it. This isn't just a folder of notebooks thrown in a repo. It's a structured publication pipeline.

How it works

The core of the repo isn't the notebooks — it's registry.yaml. Every notebook or article that appears on cookbook.openai.com must be registered there with a title, path, slug, tags, authors, and date. The schema is enforced via .github/registry_schema.json which requires title, path, slug, tags, and authors as mandatory fields:

"required": ["title", "path", "slug", "tags", "authors"]

The website rebuild is triggered by a single GitHub Actions workflow in .github/workflows/build-website.yaml — a curl POST to a deploy hook on every push to main. That's it. No build step in the repo itself; the site generation is external.

Where the repo does have its own CI is notebook validation. .github/scripts/check_notebooks.py uses nbformat to validate any changed .ipynb files on every pull request:

def get_changed_notebooks(base_ref: str = "origin/main") -> list[Path]:
    result = subprocess.run(
        ["git", "diff", "--name-only", base_ref, "--", "*.ipynb"],
        capture_output=True,
        text=True,
        check=True,
    )
    return [Path(line.strip()) for line in result.stdout.splitlines() if line.strip()]

It only validates changed notebooks — not the whole repo. Practical choice: with hundreds of notebooks spanning years, full repo validation would block every PR on a legacy notebook with a bad cell.

The authors.yaml file is a separate concern — it maps GitHub handles to display names, websites, and avatars. Authors not listed there fall back to GitHub profile data. The schema is again enforced via .github/authors_schema.json. This two-tier author system (explicit overrides + GitHub profile fallback) is a clean pattern for repos with both internal and external contributors.

Content is split between examples/ (standalone notebooks) and articles/ (usually partner-contributed content in longer form). There's also a partners/ subdirectory under examples/ for third-party integrations — Cerebras, Promptfoo, and others. Community contributions are explicitly welcomed but reviewed "on a best-effort basis" per the CONTRIBUTING.md, which is at least honest.

Using it

The main usage is reading and running notebooks. Clone the repo, set OPENAI_API_KEY, and open any notebook in Jupyter or VS Code:

git clone https://github.com/openai/openai-cookbook
cd openai-cookbook
export OPENAI_API_KEY=sk-...
jupyter notebook examples/How_to_count_tokens_with_tiktoken.ipynb

If you're looking for a specific pattern — say, handling rate limits — there's a dedicated notebook (examples/How_to_handle_rate_limits.ipynb). Same for token counting, streaming, embeddings for classification, clustering, entity extraction from long documents, and dozens more.

The tiktoken notebook is a good example of the format. It starts with a clear explanation, walks through the three encoding families (o200k_base, cl100k_base, p50k_base) mapped to specific models, then gives runnable code with actual output. No fluff.

Rough edges

The main tension is that this is a first-party OpenAI repo, but a significant portion of content comes from community contributors who have no obligation to update their notebooks when the API changes. A notebook written for gpt-3.5-turbo in 2023 may still reference deprecated parameters or endpoints. There's no systematic versioning or deprecation tracking at the notebook level.

The CONTRIBUTING.md is sparse — it says contributions are reviewed on a best-effort basis and to "stay tuned." That's honest, but it also means the contribution bar is opaque. There's no automated notebook execution in CI, so a merged notebook could have broken code that nobody catches until someone runs it.

The registry.yaml grows unbounded. At current scale (~200+ entries), it's manageable as a flat YAML file. It doesn't have pagination, archiving logic, or category navigation beyond tags.

Bottom line

If you're building with the OpenAI API and want working reference code beyond the official docs, this is the first place to look. The infrastructure around the notebooks — registry, schema validation, author system — is worth studying if you're building your own documentation-as-code setup.

openai/openai-cookbook on GitHub
openai/openai-cookbook